Qualitative Assessments of Top Charities (2023)

Overview

In theory, our recommendations are maximizing for one thing: total improvement in well-being per dollar spent. This is what our cost-effectiveness estimates intend to capture.

In practice, there are costs and benefits that can’t easily be quantified and are not estimated in our models. We make qualitative assessments to account for these unmodeled costs and benefits. These qualitative assessments help us prioritize among top charities when they have similar modeled cost-effectiveness.

Published: May 2024 (November 2020 version, November 2019 version)

Overview
Qualitative assessments of GiveWell’s top charities
What we assess qualitatively
- What types of impact do we assess qualitatively?
- How do we assess impact qualitatively?
How do we assess performance on each of these proxies?
How we could be wrong

Qualitative assessments of GiveWell’s top charities

We use four designations to indicate our assessment of the relative performance of our top charities on qualitative dimensions, listed below from strongest to weakest:

"Stands out" (strongest)
"Relatively strong"
"Average"
"Relatively weak" (weakest)

We believe our top charities are exceptional relative to the majority of organizations. When we refer to a top charity as "relatively weak" on an element below, we do so in the context of its existing standing as a top charity and strength on other dimensions of our review process. In other words, these assessments are intended to capture differences among GiveWell top charities, rather than absolute rankings among all charitable organizations.

Our latest qualitative assessment of top charities (as of December 2023)1 is in the table below.

Criteria	Against Malaria Foundation	Helen Keller International's VAS program	Malaria Consortium's SMC program	New Incentives
Responses to our questions	Average	Average	Relatively strong	Stands out
Prioritization discussions	Average	Average	Relatively strong	Stands out
Self-improvement and attitude toward mistakes	Average	Average	Relatively strong	Stands out
Role in field	Average	Relatively strong	Stands out	Average
Responsiveness	Relatively weak	Average	Stands out	Stands out
Giving us feedback	Average	Average	Relatively strong	Relatively strong
Quality of information shared	Average	Average	Stands out	Relatively strong
Incorporating feedback from participants and last mile providers	Not yet assessed	Average	Not yet assessed	Relatively strong

A detailed explanation of the reasoning for our assessment is in each organization’s GiveWell report:

What we assess qualitatively

The factors we include in our qualitative assessments fall outside the scope of our cost-effectiveness models because we feel that either (a) they are not possible to quantitatively estimate in a reasonable way, or (b) the time it would take to collect the information necessary to make a reasonable quantitative estimate would be disproportionate to how large of a difference the factor would make to the modeled cost-effectiveness.

What types of impact do we assess qualitatively?

Below, we describe three characteristics of our top charities' work that are not fully captured in our cost-effectiveness model. This list is illustrative rather than comprehensive. For the most part, we don't account for these characteristics directly but instead consider each organization's performance via eight proxy metrics (included in the table above and discussed in the next section).

Allocation of funding. Organizations may allocate funding among different locations and program participants based on considerations that we don't capture in our cost-effectiveness model. For example:
- Our models use estimates of disease burden by country. Organizations may have access to better information about the true disease burden in the communities that participate in the program within a country and prioritize funding based on that information.
- Our models generally use past costs in a country or across the program as a whole to estimate future costs in that location. Organizations may have more accurate information about how their costs may be different in the future (e.g., if the organization previously paid startup costs that will not be repeated, implying lower future costs, or plans to expand to a hard-to-reach population, implying higher future costs).
- Organizations may account for risks that our model does not, such as the likelihood of withdrawal of government support for the program or rising security concerns in their area of operation.
- Organizations may account for benefits that our model does not, such as work in one country being more likely than work in another country to lead to a government taking over the costs of the program in the future.

Quality of implementation: There are aspects of the quality of the implementation of a program that our cost-effectiveness model doesn't capture. For example:
- The quality of interactions between program participants and program staff may have long-term consequences for the costs and uptake of delivering the same or similar interventions in the future.
- The quality of communications about the program with participants can affect, for example, whether participants receive maximal benefit from the intervention (e.g., whether they consistently use an insecticide-treated net to prevent malaria).
- The equity of the organization's distribution of the intervention could, in some cases, affect whether the individuals who can most benefit are reached and can affect whether the program causes jealousy, conflict, or distrust in the local community.
- Organizations' decisions may affect local markets for talent or goods to varying degrees.

Additional positive impact: Organizations may have positive impacts outside of our model of their delivery of their programs. For example:
- Conducting and disseminating research that improves other actors' decisions.
- Raising funds from donors who would have otherwise spent that funding on something less cost-effective.
- Building government capacity that carries over into other programs.
- Providing assistance to partners that increases those partners' impact outside of the specific intervention.
- Creating a model for a program that other entities copy.
- Improving coordination between different partner organizations.

How do we assess impact qualitatively?

For some of the elements of each characteristic above, we could seek out information to more directly understand how well the organization performs and update our cost-effectiveness model to incorporate that information. For example, we could fund work to collect more precise data on disease burdens, or pay for third party surveys to learn more about program participants’ interactions with GiveWell-funded top charities.

In some of these cases, this is work we aim to make progress on in the future—though we expect our answers to continue to be incomplete due to the challenges in measurement of these factors. In other cases, getting direct information is infeasible or prohibitively expensive.

For the most part, where we have not observed or will not observe the organizational features discussed above, we rely on information by proxy:

Differences we've observed in how organizations operate and communicate with us.
In some cases, we’ve gathered feedback from partners about the quality of their interactions with the organization, or on the impact the organization’s research has had on other actors' decisions.

Our underlying assumption is that strong performance in these proxy metrics is likely to translate into higher impact in a way that may not be fully captured by our cost-effectiveness models.

For each of our top charities, we've subjectively answered the following questions:2

Responses to our questions: When we ask the organization a question, do its answers generally either indicate that it has thought through the question before or show us why getting an answer is not important to understanding its work?
Prioritization discussions: Do the organization’s explanations about how it allocates funding among different locations and program participants seem to be aimed at maximizing its impact per dollar? Is the organization consistent in what it says about how it prioritizes among different locations and program participants and is it able to clearly explain any changes in its approach?
Self-improvement and attitude toward mistakes: Does the organization proactively share information with us and publicly about mistakes it has made? Has the organization designed systems to alert it to problems in its programs and has it made changes based on information from those systems? Has the organization experimented with ways to improve its impact?
Role in field: Is the organization producing research aimed at informing policymakers or other implementers? Does it participate in global conversations about its field of work?
Responsiveness: Does the organization send us information by mutually agreed-upon deadlines? Is it responsive to our emails?
Giving us feedback: Does the organization catch our mistakes and let us know, thus improving our research? Does the organization make useful suggestions for how we could improve our research process and cost-effectiveness models?
Quality of information shared: Have the documents that the organization has shared with us contained significant errors? Has the organization told us things that were inaccurate? Has the information provided been easy to interpret and use? Have the organization's projections of when it would achieve its goals generally been accurate?

For our 2023 qualitative assessments, we also added a new criterion:3

Incorporating feedback from participants and last mile providers: Does the organization have a process for getting feedback from program participants and from last mile providers, i.e., those directly delivering the program? (For example, monitoring, feedback surveys, focus groups, etc.) Do they incorporate feedback to improve service delivery?

How do we assess performance on each of these proxies?

Responses to our questions

Stands out: Examples of strength in this category include knowledgeable and nuanced views on questions such as the pros and cons of methodological choices in coverage surveys, the funding landscape, logistical challenges to scaling up in various locations, global commodity supplies, and modifications to program implementation to account for new information or changing circumstances.
Relatively weak: Seeming relatively weak on this proxy means that we are often unable to get satisfying answers to questions about the organization's decision making.

Prioritization discussions

Stands out: Stand-out performance on this proxy may include using a model to compare the cost-effectiveness of different programs that the organization is considering extending or starting, or rating opportunities for scale-up on factors that are believed to correlate with cost-effectiveness.
Relatively weak: Seeming relatively weak on this proxy may include inconsistency in how the organization describes its prioritization process or increasing spending without increasing expected output when additional funding becomes available, without providing a clear explanation for this choice.

Self-improvement and attitude toward mistakes

Stands out: Examples of strength in this area include experiments aimed at improving impact, integrating learnings from robust past monitoring, or publicly sharing about mistakes and responses to mistakes.
Relatively weak: Seeming relatively weak on this proxy may include very few, if any, past cases of proactively sharing mistakes, very few, if any, examples of experiments to improve impact, and relatively weak systems for detecting problems in program implementation.

Role in field

Stands out: Examples of strength include playing a major role in its field through research, raising awareness, networking, or influencing policy.
Average: We've considered lack of evidence for a prominent role in the field as "average" performance rather than "relatively weak."

Responsiveness

Stands out: Strength in responsiveness could include consistently communicating clearly about timelines for sharing information and responding to our questions and requests in a timely manner. Another example is when organizations make it easy for us to access data about its program on demand through shared sources.
Relatively weak: Seeming relatively weak on this proxy may include frequently missing mutually agreed-upon deadlines, not responding to emails in a timely way (particularly after multiple follow-ups), and not providing full responses to our requests (e.g., answering one question when an email has multiple questions).

Giving us feedback

Stands out: Strength in feedback can include raising concerns about our approaches or priorities on a high level or finding specific errors in our spreadsheets.
Relatively weak: Seeming relatively weak on this proxy may include very few, if any, past cases of providing feedback on our work and demonstrating lack of familiarity with our processes after multiple years of engagement.

Quality of information shared

Relatively strong: Seeming relatively strong on this proxy includes (a) sharing written materials that are consistently accurate and easy to understand (with occasional errors), and (b) being well-calibrated in describing plans and expectations for the future (having a track record of accurate predictions about timelines on which work will be completed).
Relatively weak: Seeming relatively weak on this proxy may involve (a) sharing information on several occasions that contained errors that were difficult to detect and/or had implications for program management (in other words, information that indicated that the organization was using inaccurate information internally that may have affected the operation of the program), and (b) lacking a track record of accurate predictions about timelines on which work will be completed.

Incorporating feedback from participants and last mile providers

Relatively strong: Strength on this proxy includes having systems in place for gathering feedback from participants and last mile providers, and examples of using the information gathered to update the program.
Relatively weak: Seeming relatively weak on this proxy would mean not having any systems in place to gather this kind of feedback.

Note that this criterion was added to our qualitative assessment in 2023, and we have not yet assessed two of our four top charities (AMF and Malaria Consortium) on it. We may spend more time investigating this in the future, and we would guess that our assessment of strong or weak performance would change if we learn more.

How we could be wrong

The proxies we discuss above may be poor predictors of the underlying characteristics for which they are intended to stand in.
Our assessment of how each top charity compares on each proxy may be biased. Some specific types of bias that we are concerned about:
- Halo/horn effect. Our assessments on the different proxies may be unreasonably influenced by each other. A positive opinion of an organization on one proxy may subconsciously lead us to assess the organization positively on another proxy. Read more about this effect here.
- Unsystematic collection of examples. We have received a large number of documents and have had many conversations with our top charities over the years. We have not always systematically cataloged examples that affect each proxy, and it could be that this means we haven’t considered examples that would affect our overall assessments.
We have not systematically asked questions of all top charities that would inform our views for each proxy. For example: What activities has an organization undertaken to influence its field and what happened as a result?

1
Although we updated this page in 2024, the assessments described here were conducted in 2023 and reflect our assessment of each organization as of December 2023. We have not yet incorporated updates to our assessments since that time into this page.
2
We used to include a “fundraising” criterion in our assessment, reflecting our understanding about whether the organization raised funding from donors who we think would have otherwise supported less cost-effective opportunities. We removed this criterion from our assessment in 2023. This is because we did not feel it was informative about the quality of an organization's delivery of the program, and we were not able to confidently assess how much an organization’s fundraising was linked to its GiveWell top charity status as opposed to other fundraising efforts.
3
We are also considering adding an “engagement with local leadership” criterion to our assessment in the future. This will assess whether the organization is well connected and actively engaged with relevant government agencies in locations where it works, and has maintained good relationships with governments in delivering its program.
We do not provide an assessment on this criterion on this page because we do not feel that we’ve gathered enough information yet to make an independent assessment of our top charities. We may do more of this in the future, particularly through conversations with government partners and other stakeholders.