How the big four made football predictable.


In one of its latest podcasts, Freakonomics’ Stephen Dubner explores the concept of suspense with three economists: Jeffrey ElyAlexander Frankel, and Emir Kamenica. At some point during the show, they try to answer the question of what creates suspense in sport. They immediately refer to football as a particularly suspenseful sport, not because a lot goes on during a game, but because at any moment something important, even dramatic could happen (see the full transcript for more on this). Even leaving your couch to grab a beer from the fridge you run the risk of missing the only goal of the game.

A full season of football, however, rarely maintains suspense until its end. Often all hope of your own favourite team taking the title has disappeared within a month or so. There are probably only four, maybe five of the teams in this year’s English Premier League (EPL) that have a real chance at finishing first. And this is not a peculiarity of the 2015/16 season. Over the last twenty seasons, 44 different teams have played in the EPL, but four teams only won the division title during this period: Arsenal (3 titles), Chelsea (4 titles), Manchester City (2 titles), and Manchester United (11 titles). Quite frustrating if you are a fan of another team!

But is the English Premier League really that predictable? And if yes, was it always the case?

In order to answer this question, I need to determine the volatility of the EPL, that is the degree of variation of each team’s performance from one season to the next. In order to measure a team’s performance over a given season, I will compute its winning percentage, that is the fraction of matches this team has won over that particular season. It is calculated as the number of wins divided by the total number of matches, with tied matches counting for 1/2 win.

For instance, the figure below shows the evolution of the winning percentages of Arsenal and Aston Villa over the last twenty seasons. As you can see, Arsenal has maintained a fairly high level of performance throughout this period, while Aston Villa has slowly but surely sunk toward the bottom of the league.

Winning percentages - Arsenal vs Liverpool

The volatility of each team’s performance is then the absolute difference of performance from one season to the next. For instance, between the 2013/14 and 2014/15 seasons, Arsenal winning percentage went from 72.37% to 69.74%. The absolute difference between these two seasons is therefore of 2.63%. This is a rather small difference, indicating that the team’s performance has remained stable between the two seasons. On the contrary, a large absolute difference would have meant that the team’s performance one season was considerably better - or worse - than the team’s performance in the previous season.

I then compute the average volatility of the league for each season. A low average volatility for a given season would indicate that the performances of all the teams during that season are close to their performances the season before. In other words, it means that the previous season is a good predictor of the current one. On the contrary, a higher average volatility would show a higher level of unpredictability between two successive seasons.

The following figure shows how the league’s volatility changed over time, since the creation of the English championship in 1888.

English top-flight football volatility

As can be expected, the English championship was the most unpredictable in the years after its creation. All the teams were new to professionalism and it is likely that a lot of experimentation took place during those years. Moreover the number of games at the time was much lower than today (12 teams only), giving more influence to each game on the value of the winning percentage. However this would have only mattered until 1905 when the championship reached today’s size.

After this initial period of higher volatility, the championship became progressively more predictable. It hit its most ‘boring’ phase just before World War II. After the war, the volatility of the championship started increasing slowly. It seems to have reached a plateau around the late 1990’s.

So, is the EPL really that predictable? It is certainly more predictable than when the English championship was born, but today’s league is not the most boring of all times. It used to be much more predictable between the two world wars, and has actually grown more exciting since then.

But what about the fact that four teams only have won the EPL in the last twenty years? Can we reconcile this fact with the results above? In order to get more insights, I reran the analysis by separating the Big Four (Arsenal, Chelsea, Manchester City and Manchester United) from the rest of the league’s team. The results since 1980 are displayed below.

English top-flight football volatility - Big Four vs rest of the league

The Big Four data is of course noisier than the rest of the league (4 vs 16 values for each average point). However it shows a rather clear trend toward lower volatility with time, that is toward more consistent performances. This is especially true for the last 10 years where the average performances of the Big Four were often twice as less volatile (that is twice as much predictable) than the average performances from the rest of the league’s teams. This would explain the apparent discrepancy between a fairly volatile EPL and the consistent dominance of the Big Four teams over the last two decades. In today’s EPL, suspense is almost absent from the top tier (where most of the money is) and the Big Four teams are safe bets for this season’s title winner, and probably also for the next few years to come. If you want some excitement however, then you will have to look below.

All the data used in this post where provided by the ‘engsoccerdata’ package for R, developed by James Curley. This package can be found on Github at the following address:

The source code for reproducing the figures in this post is available on Github at: