My notes on the GJ (Good Judgement / Tetlock superforecasting inspired / colloboration) COVID forecasting project I joined - these notes assembled midway/late through the project.
Summary
-Many principles of good (super) forecasting apply, although some are different due to nature of COVID
-Time spent on analysis is under-rated by non-forecasters / general public
-But, for sure, time spent but ignoring forecasting principles will not garner better forecasts
-Counterfactuals posed as questions are helpful for better forecasts
-But, own/super-forecasters counterfactuals / questions more useful than the project posed questions
-Obvious (to me) of one or two forecasters obv. producing very accurate forecasts from their model
-At least in modelling these forecasters could be useful for policy (eg Brazil) in certain areas
-Certain second order/third order thinking on COVID not posed, but would be interesting
Background
My day job is involved in forecasting. Mostly financial markets, bio-pharmaceutical R&D success rates, and revenue sales and second and third derivative+ outcomes such as stock prices. I read the Tetlock Superforecasting book and attempt to use most elements (especially base rate, and inside/outside thinking) of this work in my decision making and forecasts. (Blogs on previous thinking at end).
At the start of the year, I noted the GJ Open was running a “tournament” and joined and forecasted a handful of questions (current Brier score 0.327 vs Crowd at 0.411 and 58 questions down; so moderately better than average but not super; but I don’t have time to update a great amount. My forecast on NZ Gun Control laws, though I was quite proud of).
Then, I joined the GJ project study on COVID forecasting about 2-3 months ago, where we are forecasting mostly COVID deaths in various countries at various time points, although with a few other COVID-related questions as well (and second/third order forecasts eg on S&P500 potentially only indirectly related to COVID) , and in independent and crowd forecast groups.
Observations
My current score is second quartile and moderately above the average, although adjusted for those who were obviously not engaged I’m probably only just above average. I expect this to decline (as in become worse) as I’ve not been able to really update forecasts in the last few weeks, as work/life was too full, which is also a shame.
-Time spent on analysis is important / updating forecasts with new information
Those who you can see update forecasts more regularly are better - certainly more informed. This is a function on time being able to be spent forecasting. There are people who can look at it 1 hour a week, some a few hours a week, and some much more. I observe this to make a difference.
If only as it can pick up quirks in the data more swiftly. (eg the way China changed it’s data, see below)
But, you can’t do this type of forecasting on, say 15 minutes a week - even 1 hour a week is very bottom end of what you could get away with. And you’d be unlikely to forecast accurately.
Some people think - I will read the book and use some techniques and I will be a much better forecaster. That won’t happen if you don’t have to spend practising. (though cf Range, Cf Gladwell).
-Many principles of good (super) forecasting apply, although some are different due to nature of COVID
COVID deaths have hard numbers and data behind them. A forecast about New Zealand gun control law has many more social-political elements.
While there are complex and human behavioural systems behind the data (there are decent models to form base assumptions)… Forecasting if eg, “there will be conflict between Pakistan and India by 2024 that lead to at least 2 deaths” doesn’t have such base line models (or at least not so available).
-Data gaming is a quirk, what is real data?
Most forecasters were incorrect about China COVID deaths at a certain point in the project.
This was because
China revised the data
This raises some very interesting points about how authentic data is and what data to trust.
This is a more general point and for a competition, one needs hard end points and a source of data (and to improve forecasting generally).
But, for real world, the actual data and the actual happening is potentially more important.
One can see this struggle even in US data today. There are estimates that between 4 to 7% of the US population may have had virus exposure yet the data on actualy virus postive test rates, is much smaller.
It also meant for the purposes of the project - assessing the likelihood of China revising the data was an important - maybe more important - judgment to make to be correct.
More generally, I think this applies to much data we see even from authoritative sources.
Is inflation really 2.25% ? Or GDP ? When you dig down into it, there is quite a number of uncertain points and arguments around definitions such that maybe a teend in inflation is correct, but there are any number of assumptions and flaws with inflation modelling that headline inflation as reported by your countries statistical agency may not directly apply to your circumstances.
-But, for sure, time spent is important but ignoring forecasting principles will not garner better forecasts
While I believe / observe that there is a link between time spent on forecasts and better forecasts, it’s not the only factor.
Ignoring the lessons of forecasting eg outside/inside | base rates etc. will be a hindrance.
And there is a point of diminishing returns.
Being able to use a team, I concur, helps. This might not even be explicit but by following someone you view as an expert (or the crowd expertise) as the base line forecast can help.
-Obvious (to me) of one or two forecasters obv. producing very accurate forecasts from their model
Let’s call her WW. WW was obviously even in a few weeks in producing superior forecasts to the average with also very clear assumptions so you could change your forecasts if you disagreed. She ended up in the top two spots, but from the first few weeks, I predicted she would end up in a top 3 spot.
At least in this, others - and I was not alone - can spot superforecasters. Another forecaster explicitly used WW to produce superior forecasts.
-At least in modelling these forecasters could be useful for policy (eg Brazil) in certain areas
Looking at these forecasts in total did and can show interesting results. We were very certain on how South Korea was handling COVID. We were all very worried about Brazil even in the early days.
Policy has a political element but a team of super-forecasters can provide very valuable base case scenarios for policy.
(Incidentally, in this some of the blog thinkings of Dominic Cummings in the UK (a contrversial govt adviser in the current UK govt set up) might be on the mark also cf seeing rooms, red teams etc. Execution on these ideas is another matter)
-Counterfactuals posed as questions are helpful for better forecasts
-But, own/super-forecasters counterfactuals / questions more useful than the project posed questions
There was a section on counterfactuals. I don’t think it was that well developed or thought out in the project. But, the core idea of using them and using them to improve thinking about scenarios was - in my observation for me - helpful.
I suggest that teams or two teams that can produce counterfactuals for each other might be the most helpful. Many of the counterfactuals actually posed weren’t necessarily that helpful for purposes of what the forecast should be. BUt helpful in thiking through cause/effect/correlation….. And cause/effect/correlation could be confusing…
Eg
If >10 states are in lockdown, cases will be ?
But, if <10 states are in lockdown, case will be ?
Do number of cases → lockdown or does lockdown → fewer cases. There’s a circular yoyo that can be worth thinking about.
-Incentives matter
Incentives are complex. While money is an obvious one, real world incentives work in a complex way and incentives for forecasting to produce the best forecasting, I think is interesting.
The incentives in financial markets and the difficulty in beating crowd wisdom in those markets is telling.
But for instance, data is mixed on using financial incentives with areas like teaching or medicine.
This is a whole other area, but highlights to me that incentives in real world complex areas are not easy to model or experiment on and why I like natural world experiments, discontinuity regression or other areas of research like that.
Still incentives in forecasting do make a difference. The type of people who will forecast for gaming or reputation points are not all types of people.
-Certain second order/third order thinking on COVID not posed, but would be interesting
To me, some of the most interesting aspects are the intersecting derivatives on COVID.
Eg COVID cases down/up → stock market reactions or,
Expectations on COVID
Eg
Positive vaccine news → expectations that COVID cases downs → no lockdown → stock market reactions
These relatively simple narratives seem to hold and I’d be interested in how super-forecasters think about these chains of effects in second order/third order thinking.
Some of it is too complex to be easily assessed but some of it does seem to play out how superforecasters think it will even if on the surface it seems too complex.
Conclusions
If you do fancy it, I’d choose a bunch of questions and try out some forecasting. You will learn a lot about your own thinking and about the topic in question if you apply your self.
Links:
GJ Project: https://www.gjopen.com/
Previous thoughts on forecasting:
https://www.thendobetter.com/investing/2019/8/23/forecasting-primer
https://www.thendobetter.com/investing/2019/7/13/superforecasting-tips
Tetlock blogs:
https://www.thendobetter.com/investing/2020/5/2/taboo-cognition-tetlock