Worked example of the framework used in "Modeling Extreme Model Uncertainty"

By Holden Karnofsky

This example is intended to give a sense for how the framework in Modeling Model Uncertainty could conceptually be applied to a real-world decision. It is highly oversimplified, in that I believe most real-world decisions involve input from a variety of models (e.g., not just the three types presented here), most of them hard to formalize and describe.

I am indebted to Jacob Steinhardt for his help with this page. Jacob designed the F_i probability distributions, thought through how to best combine them, and computed their combination under different assumptions.

Jacob's code is available here (PY).

Published: June 2014

Scenario
Alice's models
- Model 1 (m_1)
  - Reasoning
  - e_1 and F_1
- Model 2
  - Reasoning
  - e_2 and F_2
- Model 3
  - Reasoning
  - e_3 and F_3
Model combination and result
- On the method of model combination

Scenario

Imagine that there is a person, Alice, whose friend Bob comes to her with an idea for a startup. The goal of the startup is to build a new operating system for smartphones. Bob has several new ideas that he thinks will revolutionize the way operating systems for smartphones work. He also has a plan for developing an easy way for people to replace their current operating system with the new one (whether they have an iPhone, Android phone or something else).

Bob is trying to raise a total of $10,000 for 10% of the company, implying a valuation of $100,000, and he is starting by going to his friends Alice, Charlie and Dana. He doesn't want to apply to YCombinator or similar groups for plausible reasons. Since it isn't terribly unusual for someone to look to friends first for this sort of funding, "market efficiency" arguments (along the lines of "if this particular investment were promising, someone else would fund it") aren't necessarily highly relevant, and won't be relied on in this example.

Alice examines Bob's plans and comes up with 3 different ways of thinking about the situation - 3 "models" each with a different implication for expected value, and each with a different degree and type of uncertainty. (For simplicity, this example doesn't designate a "prior" specifically, though something like Model 2 - i.e., a very broad "outside view" based expected value estimate - could be thought of as a "prior" that interacts with other models).

Alice's models

Model 1 (m_1)

Reasoning

Alice thinks that if Bob succeeded in creating a smartphone operating system with a lot of momentum, the ultimate value of the company could be at least $20 billion. That's what Facebook acquired WhatsApp for recently; Google and Apple are both worth more than Facebook and could potentially be more interested in a successful mobile operating system than Facebook was in WhatsApp.
What's the probability that Bob will succeed to this degree? Alice thinks about this and ultimately concludes that she has basically no idea, but her best guess is about 1 in 10,000, with an estimated mean time to acquisition of 5 years. Putting aside more intermediate and more extreme outcomes, this seems to make the expected value of the company $2 million, and the expected value of a $1,000 investment $20,000. That would be a 20x return on investment over 5 years, an extremely good investing opportunity.
However, she recognizes that she's engaging in guesswork. She could easily imagine that the probability of success she should have assigned is less than 1 in 10 million (implying a $20 expected return for a $1,000 investment, or a nearly complete loss); she could also imagine that she should have assigned 10% as the probability (implying a $2 billion valuation for the company, or $20 million for her $1,000).

e_1 and F_1

e_1 = expected value of Alice's $1,000 investment according to Model 1 = $20,000

F_1 is a lognormal distribution with central tendency equal to ln($20,000) and log-standard-deviation equal to ln($1,000). This implies that the 67% confidence interval is between exp(ln($20,000)-ln($1,000)) and exp(ln($20,000)+ln($1,000)), or $20 to $20 million.

Model 2

Reasoning

Another way to think about Alice's expected return is to model her as part of the population of angel investors. Some angel investors have good expected returns (this is distinct from good results; it refers to angel investors who have a real edge that manifests itself over a sufficient number of investments) and some have poor expected returns.
Alice doesn't have good information about the general track records of angel investors, but she reads a report on the topic and notices that the class as a whole has an average 2.6X return over 3.5 years, and that investors who perform low due diligence, do not actively participate, and do not make follow-on investments (all of which match Alice as she pictures herself) perform worse: 1.1X over 3.4 years, 1.3X over 3.6 years, and 1.4X over 3.9 years.
Based on this information, Alice imagines that the average investor in her reference class will have about a 1.3X return over 3.5 years. She straightforwardly converts this to a 5-year time horizon for integration with other models, implying that her $1,000 has an expected value of $1,454 after 5 years.
Because Alice may have misinterpreted the data, because the data may not be reliable, and because different investors have different expected returns, Alice sees a great deal of uncertainty according to this model. Considering this model in isolation, she can easily imagine that her expected return over 5 years might be 14X (if she is an exceptional investor) or .14X (if she is a poor one). (These two estimates are simply an order of magnitude more and less than her midpoint estimate.) She thinks it's highly unlikely that her expected return is over 1000X, since that would (in her judgment) probably be better than YCombinator's historical aggregate return.

e_2 and F_2

e_1 = $1,454

For F_2, We use the following heavily fat-tailed distribution: P(X > x0) = 1/(1+b*(x0/s) + (x0/s)^2), which is equal to 1 at x0=0 and decays toward zero approximately quadratically. Setting b=0.5 and s=$1,454 causes F_2 to have most of its probability mass between $145 and $14,547 (an order of magnitude less and more than $1,454).

Model 3

Reasoning

A third way to think about Alice's expected return is to imagine that her expected return is accurately predicted by the aggregate expected return estimate of other people who are intelligent and well-informed about Bob's idea.
Alice talks to Charlie and Dana, both of whom have been approached by Bob to put in money. Charlie says, "I think this is a bad investment; I'm not putting in money and suggest that you don't either." Dana says, "I think this investment has an expected 5-year return of about -50%, that is, I think if you put in $1000 the mean value of your holding will be $500." Each tries to explain their reasoning, but in both cases, Alice isn't able to make sense of the reasoning and gains no additional information from this discussion.
Based on this information, Alice estimates that an aggregate expected return estimate of other intelligent well-informed people would have an average value of $500 (for the value of the $1000 investment after 5 years).
However, there are many sources of uncertainty here. Alice's friends may be unrepresentative of a theoretical population of intelligent well-informed people. They may also not be particularly intelligent or well-informed. Considering this model in isolation, Alice can easily imagine that the value of her $1000 investment after 5 years might, in expectation, be $50 or $5000, and generally feels that there's a fat tail as well due to her low sample size.

e_3 and F_3

e_3 = $500

For F_3, We use the following heavily fat-tailed distribution: P(X > x0) = 1/(1+b*(x0/s) + (x0/s)^2), which is equal to 1 at x0=0 and decays toward zero approximately quadratically. Setting b=8 and s=$500 causes F_3 to have most of its probability mass between $50 and $5,000 (an order of magnitude less and more than $500).

Model combination and result

To combine the three models, we take the geometric mean of their probability densities (brief justification).

Combining F_1, F_2, and F_3 in this way gives a resulting probability distribution with median of $960 for the value of the investment.

If we imagine that m_1 came out wildly more optimistic, but also wildly more uncertain, such that its mean value were $10^50 but the probability on e_1=$20 were the same as it currently is, then the combination of the three models would have a median of $896 instead.

If we imagine that m_1 were more robust - that Alice assigned an 80% probability that e_1>=$10,000 - then the combination of the three models would have a median of $6569.

Thus, when considering the importance of model m_1, the robustness is a more important consideration than the model's expected value.

On the method of model combination

One method of combining probability distributions is simply to multiply the probability densities (and renormalize). This is equivalent to applying Bayes's rule assuming independence of the probability distributions (as laid out here). An alternative is to take the geometric mean of the probability densities. One justification for the geometric mean pertains to invariance under future Bayesian updates. It intuitively seems like, given new information, we should get the same answer if we first update our models with the new information and then combine them, or combine the models and update the combined model with the new information. The geometric mean is the only way to do this while also treating all the models symmetrically. (If all three models were to incorporate the same new piece of information via Bayes' rule (e.g. by multiplying in the likelihood), then their geometric mean would update as if it had directly incorporated that piece of information. However, their product would "triple-count" the information and update too strongly.)

For a set of normal distributions, either approach gives the same mean. The following shows what normal distribution results from the geometric mean of n normal distributions; the product would simply exclude the "1/n" exponent, which is irrelevant to the final mean.

$(\prod\limits_{i=1}^n exp[-(x-\mu{_{i}})^2/2\sigma{_{i}}^2])^{1/n} \newline\newline = cons \cdot exp[-x^2 / (\sum\limits_{i=1}^n2n\sigma_i^{2}) + x \cdot \sum\limits_{i=1}^n \mu_i/n\sigma_i^2] \newline\newline = cons \cdot exp[-(x-\mu)^2/2\sigma^2] \newline\newline where: \newline\newline 1/\sigma^2 = \sum\limits_{i=1}^n1/n\sigma_i^2 \newline\newline \mu = (\sum\limits_{i=1}^n \mu_i/\sigma_i^2) / (\sum\limits_{i=1}^n1/\sigma_i^2) \newline\newline$