Monday, 14 August 2017

When simple becomes complicated...

A while ago, Anna and I published an editorial in Global & Regional Health Technology Assessment. In the paper, we discuss one of my favourite topics $-$ how models for health technology assessment and cost-effectiveness analysis should increasingly move away from using spreadsheet (basically, Excel) and towards proper statistical software.

The main arguments that historically have been used to support spreadsheet-based modelling are those of "simplicity and transparency" $-$ which really grinds my gears. In the paper we also argue that, may be, as statisticians we should invest in efforts towards designing our models using user-interfaces, or GUIs $-$ the obvious example is web-apps. This would expand and extend work done, eg in SAVI, or BCEAweb or bmetaweb, just to name a few (that I'm more familiar with...). 

Friday, 28 July 2017

Picky people (2)

I've complained here about the fonts for some parts of the computer code in our book . Eva (our publisher) has picked up on this and has been brilliant and very quick in trying to fix the issue. I think they will update the fonts so that at least on the ebooks version all will look nice!

Friday, 7 July 2017

Conflict of interest

I am fully aware that this post is seriously affected by a conflict of interest, because what I'm about to discuss (in positive terms!) is work by Anthony, who's doing a very good job on his PhD (which I co-supervise).

But, I thought I'd do like our former PM (BTW: see this; I really liked the series) and sort conflict of interests by effectively ignoring them (to be fair, this seems to be a popular strategy, so let's not be too harsh on Silvio...).

Anyway, Anthony has written an editorial, which has received some traction in the mainstream media (for example here, here or here). Not much that I disagree with in Anthony's piece, except that I am really sceptical of any bake & eat situation $-$ the only exception is when I actually make pizza from scratch...

Tuesday, 20 June 2017

Picky people

Our book on Bayesian cost-effectiveness analysis using BCEA is out (I think as of last week). This has been a long process (I've talked about this here, here and here). 

Today I've come back to the office and have open the package with my copies. The book looks nice $-$ I am only a bit disappointed about a couple of formatting things, specifically the way in which computer code got badly formatted in chapter 4. 
We had originally used specific font, but for some reason in that chapter all computer code is formatted in Times New Romans. I think we did check in the proofs and I don't recall seeing this (which, to be fair, isn't necessarily to swear that we didn't miss it, while checking...).

Not a biggie. But it bothers me, a bit. Well, OK: a lot. But then again, I am a(n annoyingly) picky person...

Monday, 19 June 2017

Homecoming (of sort...)

I spent last week in Florence for our Summer School. Of course, it was home-coming for me and I really enjoyed being back to Florence $-$ although it was really hot. I would say I'm not used to that level of heat anymore, if it wasn't for the fact that I have caught my brother (who still lives there) huffing and complaining about it several times!...

I think it was a very good week $-$ we had capped the number of participants at 27; everybody showed up and I think had a good time. I think I can speak for myself as well as for Chris, Nicky, Mark and Anna and say that we certainly enjoyed being around people who were so committed and interested! We did joke at several points that we didn't even have to ask the questions $-$ they were starting the discussion almost without us prompting it...

The location was also very good and helped make sure everybody was enjoying it. The Centro Studi in Fiesole is an amazing place $-$ not too close to Florence that people always disappears after the lectures, but not too far either. So there was always somebody there even for dinner and a chat in the beautiful garden, although some people would venture down the hill (notably, many did so by walking!). We also went to Florence a couple of times (the picture is one of my favourite spots of the city, which I obviously brought everybody to...).

Friday, 9 June 2017


So: for once I woke up this morning feeling slightly quite tired for the late night, but also rather upbeat after an election. The final results of the general election are out and have produced quite some shock. 

Throughout yesterday, it looked as though the final polls were returning an improved majority for the Conservative party $-$ this would have been consistent with the "shy Tory" effect. Even Yougov had presented their latest poll suggesting a seven points lead and improved Tory majority. So I guess many people were unprepared for the exit polls, which suggested a very different figure...

First off, I think that the actual results have vindicated Yougov's model (rather than the poll), based on a hierarchical model informed by over 50,000 individual-level data on voting intention as well as several other covariates. They weren't spot on, but quite close. 

Also, the exit polls (based on a sample of over 30,000) were remarkably good. To be fair, however, I think that exit polls are different than the pre-election polls, because unlike them they do not ask about "voting intentions", but the actual vote that people have just cast.

And now, time for the post-mortem. My final prediction using all the polls at June 8th was as follows:

                mean       sd 2.5% median 97.5%     OBSERVED
Conservative 346.827 3.411262  339    347   354          318
Labour       224.128 3.414861  218    224   233          261
UKIP           0.000 0.000000    0      0     0            0
Lib Dem       10.833 2.325622    7     11    15           12
SNP           49.085 1.842599   45     49    51           35
Green          0.000 0.000000    0      0     0            1
PCY            1.127 1.013853    0      2     3            4

Not all bad, but not quite spot on either and to be fair, less spot on than Yougov's (as I said, I was hoping they were closer to the truth than my model, so not too many complaints there!...).

I've thought a bit about the discrepancies and I think a couple of issues stand out:

  1. I (together with several other predictions and in fact even Yougov) have overestimated the vote and, more importantly, the number of seats won by the SNP. I think in my case, the main issue had to do with the polls I have used to build my model. As it has happened, the battleground in Scotland has been rather different than the rest of the country, I think. But what was feeding into my model were the data from national polls. I had tried to bump up my prior for the SNP to counter this effect. But most likely this has exaggerated the result, producing an estimate that was too optimistic.
  2. Interestingly, the error for the SNP is 14 seats; 12 of these, I think, have (rather surprisingly) gone to the Tories. So, basically, I've got the Tory vote wrong by (347-318+12)=41 seats $-$ which if you actually allocate to Labour would have brought my prediction to 224+41=265. 
  3. Post-hoc adjustements aside, it is obvious that my model had overestimated the result for the Tories, while underestimating Labour's performance. In this case, I think the problem was that the structure I had used was mainly based on the distinction between leave and remain areas at last year's referendum. And of course, these were highly related to the vote that in 2015 had gone to UKIP. Now: like virtually everybody, I have correctly predicted that UKIP would get "zip, nada, zilch" seats. In my case, this was done by combining the poor performance in the polls with a strongly informative prior (which, incidentally, was not strong enough and combined with the polls, I did overestimate UKIP vote share). However, I think that the aggregate data observed in the polls had consistently tended to indicate that in leave areas the Tories would have had massive gains. What actually happened was in fact that the former UKIP vote has split nearly evenly between the two major parties. So, in strong leave areas, the Tories have gained marginally more than Labour, but that was not enough to swing and win the marginal Labour seats. Conversely, in remain areas, Labour has done really well (as the polls were suggesting) and this has in many cases produced a change in colours in some Conservative marginal seats.
  4. I missed the Green's success in Brighton. This was, I think, down to being a bit lazy and not bothering telling the model that in Caroline Lucas' seat the Lib Dem had not fielded a candidate. This in turn meant that the model was predicting a big surge in the vote for the Lib Dems (because Brighton Pavilion is a strong remain area), which would eat into the Green's majority. And so my model was predicting a change to Labour, which never happened (again, I'm quite pleased to have got it wrong here, because I really like Ms Lucas!).
  5. My model had correctly guessed that the Conservatives would regain Richmond Park, but that the Lib Dems had got back Twickenham and Labour would have held Copeland. In comparison to Electoralcalculus's prediction, I've done very well in predicting the number of seats for the Lib Dems. I am not sure about the details of their model, but I am guessing that they had some strong prior to (over)discount the polls, which has lead to a substantial underestimation. In contrast, I think that my prior for the Lib Dems was spot on.
  6. Back to Yougov's model, I think that the main, huge difference, has been the fact that they could rely on a very large number of individual level data. The published polls would only provide aggregated information, which almost invariably would only cross-tabulate one variable at a time (ie voting intention in Leave vs Remain, or in London vs other areas, etc $-$ but not both). To actually be able to analyse the individual level data (combined of course with a sound modelling structure!) has allowed Yougov to get some of the true underlying trends right, which models based on the aggregated polls simply couldn't, I think.
It's been a fun process $-$ and all in all, I'm enjoying the outcome...

Wednesday, 7 June 2017


Today I've taken a break from the general election modelling $-$ well, not really... Of course I've checked whether there were new polls available and have updated the model! 

But: nothing much changes, so for today, I'll actually concentrate on something else. I was invited to give a talk at the Imperial/King's College Researchers' Society Workshop $-$ I think this is something they organise routinely.

They asked me to talk about "Blogging and Science Communication" and I decided to have some fun with this. My talk is here. I've given examples of weird stuff associated with this blog $-$ not that I had to look very hard to find many of them...

And I did have fun giving the talk! Of course, the posts about the election did feature, so eventually I got to talk about them to...

Tuesday, 6 June 2017

The Inbetweeners

When it first was shown, I really liked "The Inbetweeners" $-$ it was at times quite rude and cheap, but it did make me laugh, despite the fact that, as it often happens, all the main characters did look a bit older than the age they were trying to portrait...

Anyway, as is increasingly often the case, this post has very little to do with its title and (surprise!) it's again about the model for the UK general election.

There has been lots of talk (including in Andrew Gelman's blog) in the past few days about Yougov's new model, which is based on Gelman's MRP (Multilevel Regression and Post-stratification). I think the model is quite cool and it obviously is very rigorous $-$ it considers a very big poll (with over 50,000 responses), assumes some form of exchangeability to pool information across different individual respondents' characteristics (including geographical area) and then reproportions the estimated vote shares (in a similar way to what my model does) to produce an overall prediction of the final outcome.

Much of the hype (particularly in the British mainstream media), however, has been related to the fact that Yougov's model produces a result that is very different from most of the other poll analyses, ie a much worse performance for the Tories, who are estimated to gain only 304 seats (with a 95% credible interval of 265-342). That's even less than the last general election. Labour are estimated to get 266 (230-300) seats and so there have been hints of a hung parliament, come Friday.

Electoralcalculus (EC) has a short article in their home page to explain the differences in their assessment, which (more in line with my model) still gives the Tories a majority of 361 (to Labour's 216).

As for my model, the very latest estimate is the following:

                mean        sd 2.5% median   97.5%
Conservative 347.870 3.2338147  341    347 355.000
Labour       222.620 3.1742205  216    223 230.000
UKIP           0.000 0.0000000    0      0   0.000
Lib Dem       11.709 2.3103369    7     12  16.000
SNP           48.699 2.0781525   44     49  51.000
Green          0.000 0.0000000    0      0   0.000
PCY            1.102 0.9892293    0      1   2.025
Other          0.000 0.0000000    0      0   0.000

so somewhere in between Yougov and EC (very partisan comment: man how I wish Yougov got it right!).

One of the points that EC explicitly models (although I'm not sure exactly how $-$ the details of their model are not immediately evident, I think) is the poll bias against the Tories. They counter this by (I think) arbitrarily redistributing 1.1% of the vote shares from Labour to the Tories. This probably explains why their model is a bit more favourable to the Conservatives, while being driven by the data in the polls, which seem to suggest Labour are catching up.

I think Yougov model is very extensive and possibly does get it right $-$ after all, speaking only for my own model, Brexit is one of the factors and possibly can act as proxy for many others (age, education, etc). But surely there'll be more than that to make people's mind? Only few more days before we find out...