Friday, 8 June 2012

Euro 2012 predictions

In the run up to major football events, the guys at the Norwegian Computing Centre always prepare their predictions for the final results. They use a relatively simple model which predicts the number of goals scored by team $t_1$ when playing team $t_2$ as a function of a "baseline" scoring intensity, corrected for the relative difference in strength between the two teams.

This is not too dissimilar from the method that Marta and I used a while ago (by the way: interesting story. I was trying to have an excuse to bring more football into the household $-$ she got interested in the model, but no actual increase in the time spent watching games has ever been recorded).

Our main point was that there is indeed correlation between the number of goals scored by the two competing teams, but instead of using complex forms for the likelihood (eg bivariate Poisson models), it can all be accounted for by extending the structure to a hierarchical model, based on independent Poissons that are connected through common structured effects. We did reasonably well:

(the black line is the observed dynamics in terms of points through the 2007-2008 season of the Italian Serie A, while the blue line is our prediction; for most teams the prediction is really good).

In our case, we were estimating results from national leagues, which I think is a bit easier, given that there are more data and that the seasons are longer, meaning that the "true" values tend to come up with a stronger signal (and stronger teams will be predicted to score more goals and thus win more games). There was an interesting issue with overshrinkage (check the model out).

In the case of international tournaments, prediction is a bit more complex; the tournament is played over just a month and there is much more scope for random variability in the performances of the teams. Thus, all in all, I think that their model is OK and will probably do well in terms of prediction. The nice feature is that they will update the predictions as more games are played (not sure they do it in a proper Bayesian fashion $-$ not saying they don't; just haven't read the details).

The predicted semi-finals are Spain-Germany and Italy-Netherlands. I think we'd be happy to reach the semis. We may have a shot, but I wouldn't be too surprised if we went back home much sooner (but that's me being a bit pessimistic, may be).