Football and Superforecasting

A bunch of scatterbrained rambling about how some of the ideas from the book Superforecasting could benefit football coverage.

May 30, 2023

Below is a ChatGPT-helped summary of Superforecasting for those who haven't read it:

Superforecasting is a book by Philip Tetlock and Dan Gardner that explores the principles and methods behind accurate prediction. Tetlock and Gardner argue that accurate forecasting is a skill that can be learned and honed, and that the right approach can help individuals and teams make more accurate predictions about the future.
The book draws on extensive research and interviews with successful forecasters, or "superforecasters," to identify the key qualities and methods that lead to accurate predictions. These include a focus on probabilistic thinking, active seeking of feedback, avoiding common biases and heuristics, a willingness to update beliefs, and the ability to work collaboratively in teams.
The work built upon Tetlock’s Good Political Judgement, in which Tetlock claimed — and later rebuked in Superforecasting — that expert predictions are as accurate as a dart-throwing chimpanzee.

It’s difficult not to wonder how the principles in the book could be applied to football coverage, particularly in the analytics ‘community’.

With its growth over the past decade, there probably wouldn’t be much opposition to the claim that football analytics has gone mainstream. Most football articles tend to at least consider data rather than relying solely on opinion, while Expected Goals are now featured in broadcasts and popular video game franchises. Incorporating some of the principles of Superforecasting could help create more analytics-focused content and prove beneficial to content creators and their audiences.

Before delving into some ideas, it’s worth giving an overview of the experiment in Superforecasting.

Superforecasting and ‘The Good Judgement’ Project

The Good Judgment Project was a research study led by Philip Tetlock aimed at improving the accuracy of geopolitical forecasts. The project began in 2011 and involved recruiting thousands of participants from diverse backgrounds to make predictions about a range of global events, such as political elections, international conflicts, and economic trends.

The predictions were reduced to a binary outcome (Will x country do y action in the next 90 days?). Participants would give their prediction with a probability. These predictions could be used to produce an overall accuracy for each participant and a Brier Score — to better assess the probability figures provided by users.

The results of the Good Judgment Project were striking, with participants consistently outperforming both expert forecasters and standard statistical models. The project also identified a range of qualities and strategies that were associated with successful forecasting, such as probabilistic thinking, a willingness to update beliefs based on new information, and a focus on tracking feedback and learning from past predictions.

The format of the project, combined with the key takeaways of improving predictive abilities, could provide football coverage with interesting avenues to explore.

Superforecasting and Football Writing

At the moment, football coverage feels much more reactive than predictive. It’s less likely you’ll see articles telling you what chance there is of x happening, and more likely you’ll see posts regarding the rise of any team who happens to be in some good form or investigations into a team in decline (“How did x end up in this position?”). Trying to be more predictive could freshen up a saturated landscape while offering feedback on previous work and creating engagement.

Before continuing, it’s worth pointing out this isn’t to point a finger at football media and claim they should be doing better. Most of my critiques come from things I’ve been guilty of in the past and have never felt satisfied with.

When writing about football, there’s no feedback system, there are no repercussions, and there’s no accountability. It’s easy to be a bit of a weasel and hedge your bets. If you’re right, you can re-post it on your timeline so everyone can applaud you. If you’re wrong and a player or team doesn’t reach the heights you expected, odds are everyone has forgotten about how you hyped up said player and team anyway.

On top of people forgetting most predictions, an increasing amount of football analysis comes with well-known disclaimers (“Small sample size claxon”, “Eredivisie attackers are known to…”, “scouting centre-backs with data is…”). You can draw attention to something interesting but say almost nothing about it. If you’re right and what you’ve drawn attention to continues, you’re great; if it doesn’t, that’s fine since you mentioned the disclaimers.

When I was slightly more active in writing, cranking out one piece a year in my most productive era, I often felt like I was typing an awful lot but saying nothing. I wanted to cover all bases no matter what happened. My predictions would always be vague. My writing on Erling Haaland back in 2018 sums this up.

Haaland already has a deal to join Red Bull Salzburg, but I thought I’d include him here as his numbers are crazy for an 18-year-old – even if the league isn’t the strongest.
In 2018 Haaland has an xG of 12.65 from 1647 minutes – 0.69 xG p90. Unfortunately, due to the seasons being weird in the data, I can’t look at his other numbers properly. It’s only separated into seasons (July onward) but the Norwegian league doesn’t run from July to May like other leagues. The main thing he offers does seem to be his goal scoring, however since July he has also managed 0.29 assists p90 from 0.19 xA p90.
When I started watching clips of him the first few goals all seemed to be from crosses, which doesn’t seem too surprising given he’s 1.92m tall, but he does seem to have quite a bit more to his game than just being a target man to hit with crosses. He seems pretty capable of running with the ball and beating his man, though his physique seems to help with this, while also seeming pretty adept at making runs in behind and pressuring the opposition when out of possession.
His two goals against Brann, in almost as many minutes, help showcase this. For his first the ball is hit over the top, he takes it down and beats his man before scoring. For the second you can see he’s happy pressuring off the ball, before getting lucky and having the ball run through to him, but then he shows good composure to go around the ‘keeper and just roll it in.
Costing just £3.6m, it seems Salzburg could have made a great signing, he’s putting up strong numbers while just being 18-years-old and playing against grown men and it wouldn’t be surprising to see him move into the top five leagues in the next few years should he continue developing in Austria.
…
[from the conclusion of the piece]
Haaland seems to have big potential and should be worth following to see how he does with Salzburg, while if he’s been bought for the first team it feels like it may not be too long before he’s seen the in the big five leagues.

You get the disclaimer about the league and the incredibly bold prediction that if a young player does well for RB Salzburg, it might not be long before he moves to a club in one of the top five leagues.

Instead, taking a leaf out of Superforecasting, what if the predictions were binary, based on something quantifiable and had a probability percentage?

How likely is it that Haaland will join a team in the top five leagues in the next 3 years?
How likely is it that Haaland will join a team in the top five leagues and the Champions League/Europa League in the next 3 years? 5 years?
How likely is it that Haaland will match his xG per 90 (or comes within ~x%) in his first season at Salzburg?
How likely is it that Haaland will score over x league goals for RB Salzburg in his first season?

If I was made to answer those questions, I’d likely have to think about Haaland’s ability and potential much more to try and gauge a probability. Then, if I did that for every player/team I wrote about, I’d have a feedback system. If my predictions were poor, I’d know and be able to dig in and see where to improve. If they were good, I could write a round-up piece and gloat.

It would also provide clarity. Writing about teams and players, you (well, at least I) can get lost and lose track of what I was looking for. It’d be much easier to gauge my opinion on Haaland in 2018 if I answered the above questions. If I thought he was very likely to join a Champions League club in the top five leagues when he’s 18, you’d know I’m big on him as a prospect — or if you compared to the probability of the first two questions above, it’d be easy to see where I place his ability.

Writers (either for outlets or hobbyists) could have questions like these at the end of each piece and collate scores over time. You could keep scores in your Twitter bio, and people could see how you tend to fare when predicting events. Of course, if you’re writing your own questions, you could make them favourable and inflate your scores, but the idea is for it to be fun and helpful (to both writer and reader), not an opportunity to claim superiority.

Also, prediction isn’t everything. Writers can write good retroactive pieces or be entertaining to read. The goal isn’t to always be correct or predict everything. The point isn’t to shame people for being wrong or deter people from making future predictions. It could even be a point of fun that someone is enjoyable to read but hopeless at predicting, often backing the long shot. But if you’re interested in analysing and writing about football, how can you expect to improve without a feedback system?

It’s easy to look at a table or graph of players from a smaller league and say: “x, y and z have interesting figures. You should watch out for them!” It’s not as easy to determine the likelihood that those players a) continue to produce those figures or b) can move to a more difficult league and reproduce strong performances/figures.

Superforecasting and Media Outlets

It's also an opportunity for outlets like Sky Sports or The Atheltic to provide more content with little effort. During a show like Monday Night Football, some of the weekend’s talking points could be condensed into 3-5 quantifiable questions, with each pundit giving their probabilities. They could do this each week, then at the end of the season have a round-up show, looking back and seeing how each pundit performed, maybe joking about some of the worst predictions and talking about how they were right/wrong with their predictions and what they thought would happen compared to what actually happened.

The Athletic could follow a similar idea. Each Premier League writer could give their probabilities for questions based on some of the weekend’s talking points. There could be a leaderboard of writers. Each writer could have their accuracy and Brier score on their profile. There could be different categories for the leaderboard, as some writers may perform well assessing new signings or predicting long-term events, while others may fare better predicting week-to-week or more short-term events.

Outlets could even have fans answer the questions and have a fan leaderboard. They could see what fans have the best Brier score, who or how many outperform pundits/writers, and how the average (wisdom of crowds) fares. It could be fun to see that Mike from Hull is much better at predicting events than any football pundit or writer in the country. It also offers an easy, commitment-free game to play and follow across the season for fans who lose interest in their Fantasy Football team by GW3.

The Superforecasting Workflow

Saying this is all well and good (and you may be reading thinking: “who’s this guy deciding to write something for the first time in how many years just to moan that we should be doing things differently?”), but it’s harder to implement.

Below is a step-by-step plan for improving predictions based on the principles of Superforecasting generated by ChatGPT.

Adopt a probabilistic mindset: Start thinking in terms of probabilities rather than black-and-white outcomes. Instead of asking "will this event happen?", ask "what is the likelihood of this event occurring?"
Seek out diverse perspectives: To improve your predictions, seek out a range of perspectives and opinions. Engage with people who have different backgrounds, experiences, and expertise to gain a broader understanding of the issue at hand.
Look for patterns and use analogies: Identify patterns and use analogies to help predict future events. This can help you see similarities between current situations and past events, enabling you to make more accurate predictions.
Focus on the base rate: Consider the base rate - the underlying probability of an event occurring - when making predictions. This can help prevent you from overestimating the likelihood of rare events.
Use feedback to learn: Continuously track and evaluate your predictions to learn from your successes and failures. Adjust your thinking and approach based on feedback, and use it to improve your future predictions.
Collaborate with others: Work collaboratively with others to make better predictions. Seek out diverse perspectives, share information, and engage in open dialogue to improve your predictions as a team.
Practice, practice, practice: Practice makes perfect. Regularly engage in forecasting exercises, either alone or with others, to develop and refine your skills.

Some of these sound easier to apply to football than others. But to take an example of what we might do, let’s imagine we travelled back in time to the World Cup break and continue the Erling Haaland theme.

You’re writing an article or recording a podcast about Haaland, Manchester City, the Premier League at the break, whatever it is you’ve mentioned Haaland. At this point, he has 18 Premier League and Manchester City have played 14 matches. At the end of the article/podcast, one of the questions could be: “Is Erling Haaland going to break the Premier League record for most goals in a season?”

First, you need to alter the question to something quantifiable. In this case, it’s easy as it’s if the number of goals he scores will exceed the record. It’d be harder if it concerned whether a team was right to sack a manager or sign a player. Then, following the step-by-step plan above, our workflow might be:

Adopt a probabilistic mindset: Instead of asking: “Is Erling Haaland going to break the Premier League record for most goals in a season?”, ask: “How likely is it Erling Haaland breaks the Premier League record for most goals in a season?
Seek out diverse perspectives: Read/watch/listen to different people’s opinions of Haaland and his goalscoring this season.
Look for patterns and use analogies
Focus on the base rate

Points 3 and 4 feel somewhat similar for our topic. You can investigate Haaland’s data. Look at his goalscoring rate, his xG, and his shots. How likely is it he continues scoring at such a rate based on his underlying figures and performance in previous seasons? What was the goalscoring rate of other players who came close to breaking the record after a similar number of matches? What’s Haaland’s injury history like? How likely is he to play a high percentage of minutes in the second half of the season? How could the break for the World Cup affect him? What’s Manchester City’s fixture difficulty? Has he benefitted from favourable fixtures? What’s Manchester City’s goalscoring history like? Are they performing unsustainably or in a way that can continue to facilitate Haaland’s goalscoring?
After considering all of this, you could ascertain a likelihood for the question.
Use feedback to learn: This will be slow, as you’d have to wait until the end of the season. However, in The Good Judgement Project participants could alter their predictions until the expiry date. So if the question asked if something would happen in x days, participants had until x-1 days to update their predictions. For football, I think it’d be better not to allow updating. (Unless over long-term predictions everyone is given a ~monthly chance to update, and then you can look back at the progress and see what made you change your mind)

Part of the challenge of predicting in football is dealing with incomplete, imperfect and evolving data. The challenge is to be able to make decisions with such limitations. By the time everyone is 90% confident a player is a great talent, it’s probably too late. Value seems to come from confidence a player will be good (for a given scenario/situation) and being correct before others are as confident.

It may result in worse scores, but not allowing updates is a better representation of football discourse. Typically there’ll be a lot of buzz around an event and plenty of thoughts about what will happen, and then you won’t hear much again. This helps create a lack of feedback for those writing about or analysing football. Something is everywhere for a short time, then nowhere soon after.
Collaborate with others: If people made public predictions, displayed their accuracy and Brier score and detailed their thought processes for their predictions, you could study them. How does their workflow or thought process differ from yours? How come Mike from Hull is so good at predictions? What’s he considering that you’re not?

Or open yourself to criticism. Maybe someone points out that you tend to underestimate the importance of x factor or y variable.
Practice, practice, practice

Conclusion

Football coverage (writing/podcasts/television punditry) could benefit from Philip Tetlock’s Superforecasting and The Good Judgement Project.
Football coverage tends to be reactive. It’s easy to be vague and hedge your bets with predictions. There’s also no feedback system, making it difficult to assess or improve.
Thinking probabilistically, predicting the likelihood of events at the end of every article/podcast/show could improve football coverage and benefit the content creator.
Bloggers could benefit from having a feedback system and the necessity to clarify their thoughts as they have to make predictions.
The audience can see the performance of pundits/writers from media outlets over an extended period. It also gives these outlets an opportunity to produce easy content.
Outlets could also allow users to take part and submit their probabilities. They could have a fan leaderboard. They could see what fans have the best Brier score, who or how many outperform pundits/writers, and how the average (wisdom of crowds) fares.

Epilogue: Defining Criteria

One more thing about thinking probabilistically, which I couldn’t fit into the above, is that it seems like it’d force you to have criteria to assess a potential signing. You often see posts like “How x would improve y team” or “x is the striker y has been looking for”.

If you had to think about these questions probabilistically, you’d have to define criteria of what would make a signing successful or what it would mean to improve a team. For example, you might follow a thought process like:

How likely is it that this player plays x% of minutes in the upcoming season? The season after? The season after that? (For each year of the contract?)
How likely is it this player improves the team? How long will it be before this player improves the starting eleven?
What does it mean to improve the team?
How many goals/xG/points/league places would lead you to say this player improved the team?
What’s the likelihood that this player performs x better than the current player? Or what’s the likelihood this player generates more x than the current player?
How likely is it that this player generating more x would improve the team's goals/xG/points total/league position?
If you’re ~70% confident that a player will generate ~5 xG more than your current striker, how confident are you this will result in more points or a higher league position? How confident are you it will result in ~10 more points, perhaps meaning you’re able to [challenge for the title, challenge for continental qualification, avoid a relegation battle]? How confident would you have to be for the signing to seem worthwhile or value for money?

Writing it out like this sounds silly, and assigning a probability to most of these questions can feel far-fetched, but it’s what you’re doing when you say x is a better signing than (or an improvement on) y. Doing this for multiple prospective signings and getting feedback would be beneficial when assessing players and signings in the future.

It also forces you to consider how analytics or data points link with performance. It’s easy to say a player makes a lot of progressive passes or be happy when you see an uptick in a progressive passes figure, but what does that mean for the team? Does it improve the chance of winning? How much by? It seems football analytics has created its own world/language/criteria, with little hard evidence that what we value correlates with improving something in real terms. It often feels like we only track the same thing rather than looking at the big picture. For example, a player performs x more often than most players and goes to a new team and continues to perform x more often than most players, and that’s good; that makes it a good signing or means the player has transferred their abilities to a new club. But does it matter how many times a player performs x if the team isn’t benefitting? If a team finishes on 50 points, signs said player, who carries over their strong figures (which are also better than the player they replaced), and the team finishes on 50 points again, how good of a signing is that?

These questions aren’t necessarily designed to be answered but to provoke thought. It feels like football coverage could benefit from some of the things I’ve mentioned, and people producing content can benefit from having to make quantifiable predictions and having a feedback system.

POTP