Monday, 7 July 2014

The Oracle (8) - let's go all the way!

This is (may be) the final post in the series dedicated to the prediction of the World Cup results $-$ I'll try and actually write another to wrap things up and summarise a few comments, but this will probably be a bit later on. Finally, we've decided to use our model, which so far has been applied incrementally, ie stage-by-stage, to predict the result of both the semifinals and the finals.

The first part is relatively straightforward; the quarter finals have been played and we do know the results that have occurred. Thus, we can re-iterate the procedure (which we described here) and i) update the data with the observed results; ii) update the "current form" variable and the offset; iii) re-run the model to estimate each team's propensity to score; iv) predict the result of the unobserved games $-$ in this case the two semifinals (Brazil-Germany and Argentina-Netherlands).

However, to give the model a nice twist, I thought we should include some piece of extra information that is available right now, ie the fact that Brazil will, for certain, play their semifinal without their suspended captain Thiago Silva and their injured "star player" Neymar (who will also miss the final, due to the gravity of his injury). Thus, we ran the model by modifying the offset variable (see a more detailed description here) for Brazil, to slightly decrease their "short-term" quality. [NB: if this were a "serious" model, we would probably try to embed these changes in a more formal way, rather than as "ad hoc" modifications to the general set up. Nevertheless, I believe that the possibility of dealing with additional information, possibly in the form of subjective/expert knowledge, is actually a strength of the modelling framework. Of course, you could say that the selection of the offset distribution is arbitrary and other possibilities were possible $-$ that's of course true and a "serious" model would certainly require more extensive sensitivity analysis at this stage!]

Using this formulation of the model, we get the following results, in terms of the overall probability of going through to the final (ie accounting for potential draws in the 90 minutes and then extra times and possibly penalties, as discussed here):

Brazil       Germany      0.605  0.395
Argentina Netherlands  0.510  0.490

So, the second semifinal is predicted to be much tighter (nearly 50:50), while Brazil are still favourites to reach the final, according to the model prediction.

As I said earlier, however, this time we've gone beyond the simple one-step prediction and have used these results to also re-run the model before the actual results of the semifinals are known and thus predict the overall outcome, ie who's winning the World Cup. 

Overall, our estimation gives the following probabilities of winning the championship (these may not sum to 1 because of rounding):

Brazil: 0.372
Germany: 0.174
Argentina: 0.245
Netherlands: 0.206

Of course, these probabilities encode extra uncertainty, because we're going one extra step forward in the future $-$ we don't know which of the potential futures will occur for the semifinals. Leaving the model aside), I think would probably like the Netherlands to win $-$ if only for the fact that in that way, Italy would still be the 2nd most frequent World Cup winners, only one title behind Brazil, and one and two above Germany and Argentina, respectively. 

Thursday, 3 July 2014

The Oracle (7)

We're now down to 8 teams left in the World Cup. Interestingly, despite a pretty disappointing display by some of the (more or less rightly so) highly rated teams, such as Spain, Italy, Portugal or England, European sides are exactly 50% of the lot. Given the quarter final game between France and Germany, at least one European team is certain to reach the semifinals. Also, it is worth noticing that the 8 remaining teams are the group winners $-$ which kind of confirms Michael Wallace's point.

We've now re-updated the data, the "form" and the "offset" variables (as briefly explained here) using the results of the round of 16. The model had predicted (as shown in the graphs here) wide uncertainty for the potential outcomes of the games (also, we had not included the added complication of extra times & penalties $-$ more on this later). I believe this has been confirmed by the actual games. In many cases (in fact, probably all but the Colombia-Uruguay game, which was kind-of-dominated by the former), the games have been substantially close. As a result, we've observed a slightly higher than usual proportion of games ending up at extra times.

So, we've also complicated (further!) our model to estimate the result by including extra times and penalties. In a nutshell, when the game is predicted to be a draw (ie the predicted number of goals scored by the two teams is the same), then we've additionally simulated the outcome of extra times. 

In doing this, we've used the same basic structure as for the regular time, but we've added a decremental factor to the linear predictor (describing the "propensity" of team A to score when playing against team B). This makes sense, since the duration of extra time is 1/3 of the normal game. Also, there is added pressure and teams normally tend to be more conservative. Thus, in this prediction, we've increased the chance of observing 0 goals and accounted for the shorter time played. If the prediction is still for a draw, then we've determined the winner by assuming that penalty shoot outs essentially are a randomising device $-$ each team have 50% chance of winning them. 

These are the contour plots for the posterior predictive distribution of the goals scored in the quarter finals, based on our revised model.
Basically all games are again quite tight $-$ perhaps with the (reasonable?) exception of Netherlands-Costa Rica in which the Dutch are favourite and predicted to have a higher chance of scoring more goals (and therefore winning the game). 

As shown in the above graph, draws are quite likely in almost all the games; the European derby is probably the closest game (and this seems to make sense given both the short- and long-term standing of the two teams). Brazil and Argentina both face tough opponents (based on the model $-$ but again, in line with what we've seen so far).

Using the result of the model in terms of prediction of the results at extra time & penalties, we estimate the overall probability of winning the game (ie either within 90 minutes or beyond) as

Brazil
Colombia
0.657
0.343
Netherlands
Costa Rica
0.776
0.224
France
Germany
0.497
0.503
Argentina
Belgium
0.607
0.393

(in the above table, the third and fourth columns indicate, respectively, the predicted chance that the team in column one and two, respectively, win the game and progress to the semifinals).

One final remark, which I think it's generally interesting, is that by the time we've reached the quarter finals, the value of the "current form" variable for Brazil (who started as hot favourites based on the evidence synthesis of the published odds that we've used to define it at the beginning of the tournament) is lower than that of their opponent. But again, Colombia have sort of breezed through all of their games so far, while Brazil have kind of stuttered and have not won games that they probably should have (taking at face value their "strength"). This doesn't seem enough to make Colombia favourites in their game against the host $-$ but beware of surprises! After all, the distribution of the possible results is not so clear cut...




Wednesday, 2 July 2014

Short course: Bayesian methods in health economics

Chris, Richard and I tested this last March in Canada (see also here) and things seem to have gone quite well. So we have decided to replicate the experiment (so that we can get a bigger sample size!) and do the short course this coming November (3-5th), at UCL.

Full details (including links for registration) are available here. As we formally say in an advert we've circulated on a couple of relevant mailing lists: 

"This course is intended to provide an introduction to Bayesian analysis and MCMC methods using R and MCMC sampling software (such as OpenBUGS and JAGS), as applied to cost-effectiveness analysis and typical models used in health economic evaluations.

The course is intended for health economists, statisticians, and decision modellers interested in the practice of Bayesian modelling and will be based on a mixture of lectures and computer practicals, although the emphasis will be on examples of applied analysis: software and code to carry out the analyses will be provided. Participants are encouraged to bring their own laptops for the practicals.

We shall assume a basic knowledge of standard methods in health economics and some familiarity with a range of probability distributions, regression analysis, Markov models and random-effects meta-analysis. However, statistical concepts are reviewed in the context of applied health economic evaluations in the lectures."

The timetable and additional info are here.

Saturday, 28 June 2014

Break!

Just to break the mono-thematic nature of the recent posts, I thought I'd just linked to this article which has appeared in the Significance website.

That's an interesting analysis conducted by researchers at the LSE, demystifying the myth that migrants in the UK have unfair advantages in accessing social housing.

I believe these kinds of findings should be made as widely accessible as possible, because of the incredible way in which these myths are used to build up conspiracy theories or political arguments that are passed for based on "facts", but are, in fact, just a partial (if not biased at all) view of a phenomenon.

Our paper on the Eurovision song contest (ESC) (here, here and here) is of course far less important to society than housing; but it struck me that a few of the (nasty) comments we got in some of the media that reported the press release of our findings were effectively along the lines: I think Europe hate Britain and the fact that we don't win the ESC is evidence to that; you say otherwise based on numbers; that doesn't move my position by an inch


At the time, I felt a bit petty because this bothered me a bit. But, a couple of days later, I came across a couple of articles (here and here) on one politician who used the same kind of reasoning to make their point (eg Britain should get out of the EU $-$ the ESC is evidence that they hate us). Which bothered me even more...

Friday, 27 June 2014

The Oracle (6)

Quick update, now that the group stage is finished. We needed a few tweaks to the simulation process (described in some more details here), which we spent some time debating and implementing.

First off, the data on the last World Cups show that during the knock out stage, there are substantially fewer goals scored. This makes sense: from tomorrow it's make or break. This wasn't too difficult to deal with, though $-$ we just needed to modify the distribution for the zero component of the number of goals ($\pi$, as described here). In this case, we've used a distribution centered on around 12% with most of the mass concentrated between 8% and 15%.



These are the predictions for the 8 games. Brazil, Germany, France and (only marginally) Argentina have a probability of winning exceeding 50%. The other games look closer.

Technically, there is a second issue, which is of course that in the knock out stage draws can't really happen $-$ eventually game ends either after extra time, or at penalties. For now, we'll just use this prediction, but I'm trying to think of a reasonable way to include the extra complication in the model; the main difficulty is that in extra time the propensity to score drops even further $-$ about 30% of the games that go to extra time end up at penalties. I'll try and update this (if not for the this round, possibly for the next one).

Monday, 23 June 2014

The Oracle (5. Or: Calibration, calibration, calibration...)

First off, a necessary disclaimer: I haven't been able to write this post before a few of the games of the final round of the group stage have been played, but I have not watched the games so far and have run the model to predict round 3 as if none of the games had been played. 

Second, we've been thinking about the model and whether we could improve it in light of its predictive ability proven so far. Basically, as I mentioned here, the "current form" variable may not be very well calibrated to start with $-$ recall it's based on an evidence synthesis of the odds against each of the teams, which is then updated given the observed results and weighting them by the predicted likelihood of each outcome occurring.

Now: the reasoning is that, often, to do well at competitions such as the World Cup, you don't necessarily need to be the best team (although this certainly helps!) $-$ you just need to be the best in that month or so. This goes, it seems to me, over and above the level and impact of "current form". 

To make a clearer example, consider Costa Rica (arguably the dark horses of the tournament, so far): the observed results (two wins against relatively highly rated Uruguay and Italy) have improved their "strength", by nearly doubling it in comparison to the initial value (based on the evidence synthesis of a set of published odds). However, before any game was played, they were considered the weakest team among the 32 participating nations. Thus, even after two big wins (and we've accounted for the fact that these have been big wins!), their "current form/strength" score is still only 0.08 (on a scale from 0 to 1).

Consequently, by just re-running the model including all the results from round 2 and the updated values for the "current form" variable, the prediction for their final game is a relatively easy win for their opponent, England, who on the other hand have had a disappointing campaign and have already been eliminated. The plots below show the prediction of the outcomes of the remaining 16 games, according to our "baseline" model.


So we thought of a way to potentially correct for this idiosyncrasy of the model [Now: if this were serious work (well $-$ it is serious, but it is also a bit of fun!) I wouldn't necessarily do this in all circumstances (although I believe what I'm about to say makes some sense)]

Basically, the idea is that because it is based on the full dataset (which sort of accounts for "long-term" effects), the "current form" variable describes the increase (decrease) in the overall propensity to score of team A when playing team B. But at the World Cup, there are also some additional "short-term" effects (eg the combination of luck, confidence, good form, etc) that the teams experience just in that month.

We've included these in the form of an offset, which we first compute (I'll describe how in a second) and then add to the estimated linear predictor. This, in turn, should make the simulation of the scores for the next games more in line with the observed results $-$ thus making for better prediction (and avoiding not-too-realistic prediction).


The computation of the offset is like so: first, we compute the difference between the expected number of points accrued so far by each of the team and the observed value. Then, we've labelled each team as doing "Much better", "Better", "As expected", "Worse" or "Much worse" than expected, according to the magnitude of the difference between observed and expected.

Since each team have played 2 games so far, we've applied this rule:
  • Teams with a difference of more than 4 points between observed and expected are considered to do "much better" (MB) than expected;
  • Teams with a difference of 3 or 2 points between observed and expected are considered to do "better" (B) than expected;
  • Teams with a difference between -1 and 1 point between observed and expected are considered to do just "as expected" (AE);
  • Teams with a difference of -2 or -3 points between observed and expected are considered to do "worse" (W) than expected;
  • Teams with a difference of more than -4 points between observed and expected are considered to do "much worse" (MW) than expected.
Roughly speaking this means that if you're exceeding expectation by more than 66% then we consider this to be outstanding, while if you're difference with the expectation is within $\pm$ 20%, then you're effectively doing as expected. Of course, this is an arbitrary categorisation $-$ but I think it is sort of reasonable.

Then, the offset is computed using some informative distributions. We used Normal distributions based on average inflations (deflations) of 1.5, 1.2, 1, 0.8 and 0.5, respectively for MB, B, AE, W and MW performances. We choose the standard deviations for these distributions so that for teams performing "much better" than expected the chance of an offset greater than 1 on the natural scale (meaning an increase in the performance predicted by the "baseline" model) would be approximately 1 (for MB), .9 (for B), .5 (for AE), .1 (for W) and 0 (for MW). The following picture shows this graphically.


Including the offsets computed in this way produces the results below. 


The Costa Rica-England game is now much tighter $-$ England are still predicted to have a higher chance of winning it, but the joint posterior predictive distribution of the goals scored looks quite symmetrical, indicating how close the game is predicted to be.



So, based on the results of the model including the offset, these are our predictions for the round of 16.



Thursday, 19 June 2014

The Oracle (4)

As promised, some consideration of our model performance, so far. I've produced the graph below, which for each of the first 16 games (ie the games it took for all the 32 teams to be involved once) shows the predictive distribution of the results. The blue line (and the red label) indicate the result that was actually observed. 




I think that the first thing to mention is that, in comparison to simple predictions that are usually given as "the most likely result", we should acknowledge quite a large uncertainty in the predictive distributions. Of course, we know well that our model is not perfect and part of it may be due to its inability to fully capture the stochastic nature of the phenomenon. But I think much of this uncertainty is real $-$ after all, often games really are decided by episodes and luck. 

For example, we would have done much better in terms of predicting the result for Switzerland-Ecuador if the final score had been 1-1, which it was until the very last minute of added time. Even an Ecuador win by 2-1 would have been slightly more likely according to our model. But seconds before Switzerland winner, Ecuador had a massive chance to score themselves $-$ had they taken it, our model would have looked a bit better...

On the plus side, also, I'd say that in most cases the actual result was not "extreme" in the predictive distribution. In other words, the model often predicted a result that was reasonably in line with the observed truth. 

Of course, in some cases the model didn't really have a clue, considering the actual result. For example, we couldn't predict Spain's trashing at the end of the Dutch (the observed 1-5 was way on the right tail of the predictive distribution). To a lesser extent, the model didn't really see Costa Rica's victory against Uruguay coming, or the size of Germany's win against Portugal. But I believe these were (more or less) very surprising results $-$ I don't think many people would have bet good money on them!

May be, as far as our model is concerned, the "current form" variable (we talked about this here) is sort of non-perfectly calibrated. Perhaps, we start with too high a value for some of the teams (eg Spain or Uruguay, or even Brazil, to some degree), when in reality they were not so strong (or at least as stronger than the rest). For Spain especially, this is probably not too surprising: after all, in the last four World Cups the reigning champions have been eliminated straight after the first round in three occasions (France in 2002, Italy in 2010 and now Spain $-$ Brazil didn't go past the quarter finals in 2006, which isn't great either).

If one were less demanding and were to measure the model performance on how well it did in predicting wins/losses (see the last graph here) instead of the actual score, then I think the model would have done quite OK. The outcome of 9 out of 16 games was correctly predicted; in 4 cases, the outcome was the opposite as predicted (ie we said A win and actually B won $-$ but one of these is the infamous Switzerland-Ecuador, one is the unbelievable Spain defeat to the Netherlands and another was a relatively close game such as Ghana-USA). 2 were really close games (England-Italy and Iran-Nigeria) that could have really gone either way, but our uncertainty was clearly reflected in nearly Uniform distributions. In one case (Russia-South Korea), the model had predicted a relatively clear win, while a draw was observed.

Wednesday, 18 June 2014

The Oracle (3)

Yesterday was the end of round 1 for the group stage $-$ this means that all 32 teams have played at least once (in fact, Brazil and Mexico have now played twice). So, it was time for us to update our model including a new measure of "current form" for each of the team. 

We based this on the previous value (which as explained here was estimated using a synthesis of published odds against each of the teams, suitably rescaled to produce a score in [0;1], with values close to 0 indicating a weak/not in form team). These were modified accounting for the result of the first game played in the World Cup. 

In particular, we computed the increase (decrease) in the original score weighting by the estimated probability of actually getting the observed result $-$ I think this makes sense: a strong team beating a weak opponent will probably increase in their current form (eg because this will naturally boost their confidence and get them closer to qualifying for the next round). But this increase will probably be marginal, in comparison to a weak team surprisingly beating one of the favourites. 

By weighing the updated scores we should get a better quantification of the current strength of each team. Also, by using some form of average including the starting value, we are not overly moved by just one result. So, for example, Spain do decrease their "form score" from 0.93 to 0.68 by virtue of their not very likely (and yet observed!) thorough defeat against the Netherlands; but without accounting for the very high original score, the updated one would have been 0.42 $-$ quite lower.

So, we can re-run the model with these two extra features:
  • the observed results for games 1-17 (including Brazil-Mexico);
  • the updated form score for each of the teams
and use the new estimated values to predict the results of the next round of games (18-32). The graphs below show contours of the estimated joint predictive distributions of the scored goals. I've also included the 45 degrees line $-$ that would be where the draws occur, so if the distribution is effectively bisected by the line, a draw is the most likely outcome. If, on the other hand the line does not divide the distribution in two kind-of-equal halves, then one of the two teams have a higher chance of winning (as they are more likely to score more goals $-$ see for example the first plot for Netherlands-Australia).



These analyses translate into the estimation of the (model-based) chance of winning/drawing/losing the games for the two opponents. The situation seems a bit more clear-cut than it was for some of the games in round 1 $-$ England seems to be involved again in one of the closest games, although this time they seem to have an edge on Uruguay.




Finally, it probably now starts to make a bit more sense to think in terms of estimated chance of progressing through to the next stage of the competition. Assuming that the current form won't change from round 2 to round 3 (which again is not necessarily the most appropriate assumption), these are the estimated probabilities of progressing to the knockout stage.




As is obvious, these are quite different from the ones displayed here which were calculated based on "current form" before the World Cup was even played. More importantly, teams have now won/lost, which is of course accounted for in this analysis.

In Group B, The Netherlands are now big favourites to go through, which Chile slightly better off than Spain (but of course, things may change later today when these two meet). Italy are now hot favourites in Group D, where formerly most-likely-to-go-through Uruguay are left with a small chance of qualifying, after losing to low-rated Costa Rica. Portugal have also taken a big hit in terms of their chances of going through (that's Group G), after being heavily defeated by Germany, who are now clear favourites.


We'll keep updating the model and reporting (perhaps in more details) on how well we'd done in predicting the results for round 1 (I haven't had much time to do this, yet, but I will do in the next few days).