Friday, 5 May 2017

Flash forward sampling

Slowly but surely, I've managed to think a bit more about the elections model. Here, I've described how I included some prior information in my model to try and "discount" the evidence provided by the polls, to obtain estimates that may be more reasonable and less affected by the short-term shocks that may (over)influence people's opinions.

However, I wasn't entirely happy with the strategy I had used $-$ the informative priors I had set on the parameters $\alpha_p$ and $\beta_p$ did induce rather precise distributions. In addition, the analysis I have made wasn't making the most of the actual inferential machine I had constructed, because it was estimating the number of seats for the average vote shares profile. But in fact, I can do better than that and actually propagate fully the uncertainty in the vote shares and have an entire posterior distribution of the seats configuration.

So, first off, I think I've refined my priors and I did so by running the model simply through "forward sampling" $-$ in other words, by not including any of the polls in my analysis to better understand what implications were deriving by my choice of priors. By selecting the means and standard deviations for the vectors $\alpha$ and $\beta$, I effectively imply the following prior expectation in terms of the vote share.
The red dots represent the "historical" averages over the past 3 general elections, which I used as a reference point. You could fiddle a bit more with the parameters of the distributions for $\alpha_p$ and $\beta_p$, but I am reasonably happy with the implications of the current choice $-$ I'm expecting the Conservatives to do much better than the historical figure; Labour is expected to be around how they normally do, but there is a chance they'll do worse than "usual" and on average they're also doing worse than in the 2015 election. The Lib Dems are predicted with relatively large uncertainty and still under their historical average $-$ I think this is reasonable and many pundits are also aligned with this. Similarly, the prior effectively gives a very low weight to UKIP $-$ and this is in line with general consensus (I think) as well as the result of last night local elections.

Interestingly,  I can map these results and propagate the uncertainty to estimate the distribution of seats in Parliament (still with no data from the polls included), to produce the following graph.
Again, I think this picture is even more convincing than the analysis of the probabilities and I feel relatively confident with this. (But of course, one could replicate the whole analysis and try different specifications, which I have to some degree).

So it's now time to include the data that are pouring in from the polls. In particular, I now have information collected over the past two weeks or so and I think in a fast-moving election such as this where opinions may be changed by a large number of "facts" and stories, it's useful to "discount" the older data. There are many ways of doing this, more or less formally $-$ I'm using a rather quick and dirty strategy, by applying a simple discount rate defined as a function of time since today. 

Each observed poll gets rescaled as $$y^{j*}_{ip}= \frac{y^j_{ip}}{(1+\delta)^t}, $$
where $ y^{j*}_{ip}$ is the number of voting intentions for party $p$ in poll $i$ under voters of type $j$ (=1 for Leavers and =2 for Remainers); and $\delta$ is an arbitrarily defined discount rate. I've tested a few versions (ranging from 0.03 to 0.1) and the results do not vary dramatically $-$ the larger the discount rate, the more older polls are discounted, which tends to reduce by a minimum of 1 and a maximum of 4 the number of seats associated with the Conservatives. This is because in the very first few polls, the advantage associated with the Tories was bigger than in the most recent).

With a discount rate $\delta=0.1$, the results estimated in terms of seats won are as in the following graph.
So, Conservatives with a median estimated number of seats of 379 (and a 95% interval estimate of 369-391, way above the line of 325 seats that are needed for a majority), Labour with 175 (163-185), Lib Dems with 25 (17-31), SNP with 49 (46-54), Green with 1 and Plaid Cymru with 3 (0-4).

I think this analysis is interesting because it is fairly easy to assess the uncertainty propagated through the model up to the actual quantity of interest (the seats won). Other pundits are being a lot less favourable to the Lib Dems, but I'm kind of happy of how my model works, especially after considering the prior analysis.

Plenty more fun to come $-$ well, depending on your definition of fun...


  1. Can you tell me your Prediction in each seat

  2. I've posted something on that just now - see if that is what you were after...