Gianluca Baio's blog: July 2012

Tuesday 31 July 2012

Non-counterfactual, anyone?

Some (probably poorly organised) thoughts on the JSM. 1) it is really a huge conference: most of the talks are in the Convention Centre, but there are so many sessions that some are actually in the nearby Hilton and so you have to go back and forth to catch the talks you're interested in, which is not great. Also, the schedule is so dense that sometimes it is just physically impossible to catch two consecutive talks that are in different locations.

2) I can't remember exactly how much, but the registration fee was certainly in the order of hundreds of dollars and it had to be paid when submitting the abstract, several months before the conference; and they were giving no refund if your abstract wasn't accepted, either (although, given how many talks there actually are, probably no abstract was rejected). So, I think it's really not great that they don't provide any food at all (there is a canteen, but you have to pay on top of the registration). And no coffee breaks or notepads either.

3) Causal inference is really big and there are at least 3-5 sessions a day on this topic. I've been to a few of them and in general it was all good stuff. But what has struck me is that essentially all (and I mean: all) the presenters dealt with it using potential outcomes. I know that's the gold standard, but what I have never got (well, since I've first read about all this back in 2003, that is) is how people can be so fully committed to counterfactuals as the only way to frame causal inference.

Recently, I re-read chapters 9-10 of Gelman-Hill (which, by the way, is a great book). I quite like how they make the distinction between standard regression methods (aimed at comparisons between two different individuals, who happen to possess the exact same characteristics apart from a treatment or exposure of interest) and regression models for causal inference, which concern a comparison of the effects of the interventions for the same individual. However, they too describe this in terms of counterfactuals, ie what would have happened to unit $i$ if, counter to the observed facts, they were given treatment $t=0$ instead of $t=1$.

I personally still think that it makes more sense to see causal effects in terms of what will happen to unit $i$ if in the future they will get treatment $t=0$ instead of $t=1$. Hopefully we'll be able to some applied work in a non-counterfactual framework for the RDD stuff, if we get our grant.

Sunday 29 July 2012

You're one of my kind

As my friend Mark said the other day, the best place to spot Italian tourists in London is just outside Lillywhites in Piccadilly, and you can tell them from the other million tourists because of their insane gesticulating and yelling at each other, and their Invicta backpacks.

But, of course, if you're Italian too, then you don't even need to see what backpack they have or hear what language they speak; for some reason, you can almost always tell whether somebody is your nationality, just by looking at them. I'm experiencing the same with all the statisticians here in San Diego. Yesterday the flight from Atlanta was packed with "us" and it was funny to see people scrabbling variance formulae, or working on their laptops running simulations.

But not just that: while I was taking a walk around San Diego's old quarter in search for a nice place to have dinner before finally succumbing to sleep, I came across at least a dozen of them. And each time, there was that weird eye-contact, implicitly acknowledging each other, something like "I know you're on of us too". In fact, when I finally sat down to get a sandwich the two blokes at the next table were chatting away discussing generalised linear models $-$ not your typical chat up phrases, but they seemed to have become quite friendly with the waitress, so who knows? May be GLMs are the next "howyouddoin?"...

Today the conference is officially open, although the morning is dedicated to registration and general mingling. This thing is huge: there are 2000 sessions (most sound really interesting, of course the most interesting $-$ including a one-hour talk by Judea Pearl $-$ are at the same time as my talk).

Saturday 28 July 2012

One paninO, two paninI

The first part of my trip to San Diego has not been too bad at all and certainly better than I was expecting. The flight to Atlanta has been quite smooth and uneventful. The food was even ok-ish and the highlights have been the following.

1) I was very pleased that they have served a panino. That has nearly made my day, since I fight a personal battle against (possibly Italian-sounding) places selling "paninis". It's an Italian word, meaning sandwich, and as such the singular is panino, while the plural is panini. Paninis is just wrong. This one wasn't the best panino ever, but at least they got the name right.

2) Unfortunately, they blew it by serving a copetta. May be they meant a coppetta (double p, double t: small cup)? A "copetta" is just as good as two paninis.

3) This time I didn't get all emotional on the plane (coming back from Vietnam I nearly cried at a Family Guy episode $-$ OK it was a touching one, but still...).

4) Clearing the passport check has also been really quick and painless. There's no free WiFi in the Airport, but a dodgy looking guy suggested I checked the food court area. I did and found a good spot just outside TGI Fridays, which I'm now shamelessly using. In fact, a few more people have even joined me!

Friday 27 July 2012

Moaning, packing and proof-reading

As usual, I'm putting off packing my bag to the very last minute. But I am going to have to go and do it in about 10 minutes. I know that once I actually leave (tomorrow morning, that is $-$ long story short, see here) then I am going to enjoy the trip to San Diego and the conference. But right now I'd rather not go.

But that's enough moaning. On the bright(?) side, the editor has sent me the proofs of my book. She has very politely let me know that I have to get back to them by next Friday. Apparently, the book is a HYBRID (no idea what that means! I suppose I'll have to find that out), so all I need to do is go through the proofreader's corrections and say if I'm happy with them.

I've briefly gone through the book and I have to say I'm impressed $-$ not so many corrections. I'll have to figure out what some of them mean, but most should be straightforward.

[... Time passing while I'm packing ...]

OK. Done packing (quite quickly, considering I also had one eye on the Opening Ceremony, which so far I think looks pretty awesome, for what I could see). I'll be back from the other side of the pond!

Wednesday 25 July 2012

Stay classy, unauthorised migrant

I still haven't decided whether I should shave and keep only my moustache, in preparation for my trip to the city of Ron Burgundy (I probably won't).

I'm really curious about San Diego as I've heard mixed opinions about it and I'm really intrigued by its close proximity to the border. My talk at the JSM is on Sunday (which is only the second day of the conference), so hopefully I'll have some time to check it out.

I'll talk about a model we've developed to estimate some selected characteristics of a difficult-to-define population (eg one including unauthorised migrants). Because it is virtually impossible to get a complete sampling frame (since we normally don't have a list of the people who don't want to be listed...), simple random sampling is not very effective and so we need an alternative method to get reasonable estimates.

The basic idea is to characterise the sampling units in terms of a set of $K$ aggregation places (we called them "centres") with which they are associated, in the sense that they often(-ish) visit them. In the case of unauthorised migrants, examples include ethnic shops, restaurants, bars, etc $-$ places where you imagine you may find the sampling units when they are out and about.

The sample is then obtained by randomly selecting $n_k$ subjects from each centre $k=1,\ldots,K$. The required information (eg age, sex, marital status, etc) is collected from the survey, whether the respondent is an authorised migrant or not (I'll swipe missing data issues under the rug, for now). The estimates of the relevant characteristics based on this sample are a biased representation of the population (if only because we cannot be sure that we've picked up all the centres).

But in addition to the main variables, we ask each subject to give information on whether they are related to any of the other $K-1$ centres. Using the information about the profile of each respondent (in terms of the centres with which they are associated), we can suitably re-proportion the sample to obtain reasonable results. Crucially, the weights used to re-proportion the sample depend on the (subjective) importance score associated with each of the $K$ centres in the analysis.

The method has been applied to real data, but earlier today I was running some simulations in which I pretended to know the overall population (both in terms of the individual profiles and other demographic characteristics of interest, which of course in real data is not possible) to check how good the estimates were, using different sample sizes.

I haven't finished yet (and by the way: that's more like it. To get a presentation done with so much time to spare is really not me!), but it all seems to be working. I'll post something if I manage to produce some nice graph.

Tuesday 17 July 2012

The BCEA Supremacy

If you're reading this, you may think that I've not really done anything else than being on the blog today. That is not entirely true and between bouts of blogging I have actually done some work.

One of the things I've accomplished today is to finish (surprisingly early!) my presentation for Friday's talk at the European Conference on Health Economics, in Zurich.

Up until now, my only contacts with Zurich have been through Jason Bourne. Digression alert (1): I've read the book after I watched the movie. This is quite boorish of me (you know what I mean if you've watched this $-$ and if you haven't, then you should!). Digression alert (2): as it turns out, there are several factual errors in the movie, many of which in the part set in Zurich; for example, there's a long scene in the "US Embassy", which in fact doesn't exist, given that the Swiss capital is Bern.

So, all in all, I don't really know anything about Zurich. I hear it's nice, but my prior is really flat and I'll check it out, albeit quickly.

I will talk about BCEA (which I've already discussed also here). The slides I have prepared are here. There isn't much time (only 12 minutes plus discussion), so I'll have to rush through some of them, but I think that should be enough to at least give a very quick idea of what the package can do.

The point I'll try to stress is that it is not able to produce or run the analysis model, but only helps in post-processing the output. It would be cool if it became the de facto standard for this kind of analyses (that's why I'm unashamedly pushing it so much, including in the book).

Inspire a generation (to hate the Olympics?)

My friend Zoë posted this picture on Facebook.

I think that is just amazing $-$ she also points out that you'd get fined £100 if you went on the bus lane to the left and £130 if you moved to the right into the Olympic lane. Obviously, you're just supposed to decompose and disappear in the air once you get to the point where the three lanes merge.

I know it's always very easy to criticise the organisers (or the people in charge, in general) especially for something as big as the Olympic Games. But it seems to me that these people are really trying veeery hard to screw things up.

I won't even mention the security fiasco, except to say that I find the way that the media (and the government) have handled this quite strange: may be I'm being really naive, but doesn't widely advertising at the 10o'clock news that the screening isn't really working give an encouraging message to potential terrorists?

But I will point this out: apparently, it took some athletes 4 hours(!) to get from Heathrow to the Olympic Village (google maps suggests, probably a bit optimistically, 45 minutes for this trip) because the bus drivers are not from London and have not been trained properly to get around the city.

Finally, I will also get this out of my system, so I can stop nagging everybody with it (I do apologise for all the times I've already told this story and promise I'll do my best to stop doing it!). Next Saturday I'll go to San Diego for the JSM (and I'll post about this trip, later). But on the same day the cycling road race is on, which happens to go through the streets around where I live. Which means that I can't cross the bridge that connects me to Heathrow, which means I'll have to leave the night before and sleep in one of the hotels at the airport (although Heathrow is just 25/30 minutes away from my house).

I think that alone deserves at least a bronze medal! On the plus side, there seems to be a bit less traffic on my way to and from work...

Sunday 15 July 2012

Cool, cool maps

The final event in our week as socialites was a visit to a photography exhibition at the Saatchi Gallery. There was some interesting stuff as well some that I could not really get my head around and thought was utterly meaningless (but I know that's probably just me showing my un-artistic side).

The best were a few pieces from the Japanese photographer Sohei Nishimo (I believe he's only 31, although he looks like a slightly older Victorian gentleman $-$ at least judging from the pictures on his website).

They are called Diorama maps and basically are extremely accurate city maps made by arranging in a huge puzzle several photos of the city taken while he was walking around.

This doesn't really make justice to the real thing, which is truly amazing! In the exhibition they had the Diorama maps of Manhattan, Paris and Tokyo, which are all awesome (by the way, the exhibition is until next Sunday and definitely worth catching).

After that, we went to have lunch at some friends' and, later, for a nice, long walk. Of course, I didn't go to the toilet before we left the house and, of course, just in the middle of our walk I was bursting for a pee. Marta, as usual, pointed out that I should have gone earlier, while I, as usual, protested that I (like all men) only go when I have to go, which of course means when there are no toilets around. Luckily, we managed to find a pub just in time.

Should the NHS fund men pregnancy?

Our week has been particularly active, socially. We've been out most nights enjoying the lukewarm, if a bit rainy, British autumn (I mean: summer; or do I?). Yesterday we went to the lovely Royal Court Theatre to see Birthday.

At first sight, the plot does not seem very interesting: it is about the troubles of a couple going through the final hours of pregnancy. But the nice twist is that, in the unspecified near future in which the story is set, thanks to some revolutionary medical procedure, man can get pregnant and so it is him (the brilliant Stephen Mangan) that has to deliver the baby.

The jokes are about the role exchange; so he sounds like a whining lady ("I asked you to do one thing, and you forgot to do it!"), while she has the clueless attitude of the man ("I didn't pack your magazines because I didn't think we'd be here so long").

It was interesting that the whole thing was based on alleged cost-effectiveness considerations: the idea is that men would always undergo caesarian section, which means less complications, the opportunity of theoretically plan ahead every delivery and thus leading to fewer costs. Also, because the procedure is supposed to be (relatively) pain free, no cost of drugs and painkillers should occur.

As it turns out, the analysis doesn't seem to have accounted for a series of more or less serious side effects and/or complications (including elusive surgeons that are constantly busy elsewhere, only to turn up en mass after 10 hours), which all happen to the delivering mother/father. So, at least for this individual (and for some funny reasons in particular for his bottom), the quality of life outcome is really poor.

All in all, I would like to sit in the NICE committee that have to consider this intervention for public funding...

Wednesday 11 July 2012

Vorsprung durch technik bias (2)

I believe this requires a little follow up. Somewhere else, Lorenzo pointed out that VW Golf drivers are probably worse than Audi's.

It did occur to me that, when I was in Italy (and, evidently, this is still the conventional wisdom), we used to say that the Golfisti were really bad and in particular they do not want to be overtaken, ever.

So this morning on my way to work I tried to have a closer look at the different cars' behaviour. May be all Audi drivers in South West London read this blog (OK: may be not all of them $-$ just those I met this morning, perhaps?) and made an effort to prove me wrong, but I have to say that, of the ones I saw today, none behaved badly. In addition, I tried to keep an eye on Golfs too, but again no bad driving there. In fact, everybody was extremely relaxed and law-abiding this morning!

To explain the phenomenon, Francesco suggested the theory that, when they don't ride a Vespa, Italian ex-pats drive an Audi. The implications of this, I think, are that:

Italians are bad drivers;
Vespa riders (thus including your truly) are also bad.

I can't honestly claim that there isn't some truth to both and in particular, I'm no faultless rider. However, given that: a) in order to legally ride a motorbike in England you have to pass 2000 official tests; and b) I did surprisingly well in mine and nearly aced nearly all of them (well, by my standards, that is), I'd say that this theory has indeed some merits, but cannot explain the incredibly statistically robust empirical evidence against Audi drivers (which by way does not seem to be affected by sex in my recollection).

Finally, Stefano took a completely different route and, inspired by the title of the post, reminded me of this: awesome!

Tuesday 10 July 2012

Vorsprung durch technik bias

Unlike the ground-breaking height experiment, this one is complete speculation (and also: no disrespect to anybody, irrespective of the car you drive).

But: while riding my Vespa around London, I've been noticing that, in general (and I'll put this as mildly and unassumingly as I possibly can), Audi drivers are terrible.

What I mean is that if I think of all the drivers I encounter in my daily one-hour ride to and from work, the ones that I remember speeding, cutting everybody else to get in front at a traffic light and unnecessarily moving to the right while I'm filtering to make it more difficult for me seem to be driving an Audi $-$ I've not got enough data to work at the actual model level, so for the moment I'll concentrate on makes.

I have to admit that I thought of this on my way back earlier today, after seeing one of them nearly causing a multiple crash and then speeding through an amber-turning-red traffic light. Thus, of course I know that every possible known bias effect is present here.

As soon as I realised that (which thankfully was just $\varepsilon \rightarrow 0$ seconds later), I tried to think carefully to see if I could remember any other instance of repeated bad driving by a particular brand of cars and, for the life of me, I could not think of any!

The power of recall bias...

Saturday 7 July 2012

Cakes, tea, heights and pick up trucks

Earlier today we went to Petersham Nurseries for a slice of cake and a tea. Lovely, as usual.

Also, for some reasons we started discussing how wide a pick up truck is. I said may be something like 2 metres, but wasn't sure. Also, I said we could compare my height to the width of parked cars to get a rough idea (although there was no pick up parked around).

Since I didn't want to lie on the dusty ground, Marta said that a person's height is more or less the same as their arm-to-arm length and that I could just stand in front of a parked car with my arms spread out.

I have never heard of this, so when we got home we did a little experiment to see if it worked. Marta's parents are visiting us, so we have a veeeery small sample of size 4 to go about. As it turned out, this worked pretty well and the arm-to-arm length was basically the same as the height (up to a couple of centimetres) in each of the 4 paired measurements (one arm-to-arm and one height for each of us).

May be I'll use this in one of my classes...

Wednesday 4 July 2012

(Semi)automatic indexing in LaTeX

Perhaps this may be of some help for LaTeX users. After I finished writing the book (by the way: proofs expected in August), I had to deal with the problem of creating an index, which the editor definitely wanted. You may think this is the least of the problems once you've actually written the whole thing, but, as it turns out, it is not quite like that...

Of course, I used LaTeX to write the book, which means that, theoretically, you can use the command \makeindex to generate the index. Unfortunately, this requires that a suitable tag \index is included every time that a particular word (or, more generally, concept) is found in the document.

In other words, this meant:
a) creating a list of concepts that I thought should be indexed;
b) inserting the tag \index{concept} next to every single instance of that concept.

I started to do that but after 10 minutes I realised this would be too tedious. I briefly lost the will to live and played around with the idea of calling the whole writing-a-book off. Fortunately, I soon came to my senses and started to look for a cleverer way. A nice solution was here. It still requires some tweaking, but that I think it gave me quite a good result with a relatively small amount of work.

The whole thing is based on a TCL code which adds suitable text in the LaTeX document. In order to do so, you need to prepare the whole procedure.

First, you need to create a text file containing all the concepts to be indexed. The syntax is quite intuitive and basically you need to declare the concepts and synonims. In particular, you need to follow LaTeX convention whereby you can nest concepts within each others. For example, I wanted to nest the concept "prior" within the concept "probability"; you can do so by using the notation
probability!prior.
In addition, you can link combinations of words to a given concept. For example, I wanted every instance of the words "prior probability" found in the text to be indexed as
Probability
Prior
and to do so I only needed to include in the file the syntax
prior probability -> probability!prior.

This is relatively straightforward, except that you have to think carefully about what you want in your index, and the list of concepts/words can be quite long. I created a spreadsheet with all the main concepts and tried to think of all the others that I wanted to nest within them and then translated this into a suitable text file.

Next you need to use the TCL programme that does all the tagging, ie puts the tag \index{concept} in the text, according to the list you've specified in the previous step. Because the marked-up file(s) can be quite messy and thus difficult to read, I thought it would be more efficient to work on copies of the original LaTeX files. Thus I ended up with an original copy (with no tags) and an "indexed" one, including all the tags, which can be compiled to produce the final document with the index.

At this point, theoretically, you only need to run the \makeindex command to produce the complete file. I say theoretically because this procedure is not 100% effective and there are still some problems. In my case, I had several exceptions that I need to deal with to avoid problems when compiling the LaTeX file (for example, an extra "\" would be occasionally inserted in the text, which leads to errors in the compilation).

The errors can be long-ish to fix, but not to identify since LaTeX will tell you where they are when compiling the document. If you can do a bit of Linux programming (but you probably can do this on Macs and Windows machine too, perhaps using DOS-like commands) they are not too difficult to correct; I did most of it using the command
sed -i,
which allows you to modify a given string within a text file.

All in all, I think it was very helpful and saved me some time. I still had to do some work, but I think not as much as I would have, had I decided to create the index from scratch in the original LaTeX files.

Pregnancy rates

Paola posted something about Richie Cotton's blog in which he discusses mathematical modelling of the monthly pregnancy rates.

The problem is to get some idea of how long it takes to get pregnant and, if I understand correctly, what Richie has done is to base his model on some point estimate of the monthly fecundity rates (MFR). As is reasonable, he has taken them to vary by women's age.

Then, he used a negative binomial model for the probability of getting pregnant after trying for a given number of months. As far as I can see, he produces an estimation of the probability of pregnancy after $0,1,\ldots,60$ months for different age groups. These estimations are based on the negative binomial model, for which, as he realises, when a positive probability $p$ is assumed, eventually (if a long enough "follow up" is considered) will tend to 1. Here's his result.

I like the graph and the underlying R code, but I think that there are several limitations to this model (NB I think he's done this in the "spirit" of blogging, ie something like a toy example to show how a real-life problem can be discussed in statistical or mathematical terms. Thus I'm not criticising him $-$ just thinking about it with my Bayesian statistician hat on).

First, from what I can gather (Disclaimer: because I should go packing instead of being on the blog, I have not read all the comments to his original post, so some of these points may have already been picked up), the curves above consider a fixed MFR for each "cohort". In other words, all the points in red are computed using a negative binomial model where the parameter $r$ (the months before a pregnancy occurs) varies from 0 to 60 months, but the parameter $p$ (representing the underlying MFR) doesn't. This is not correct, I think, because albeit perhaps minimally, the MFR should vary with every year of age. Thus, a women aged 26 probably has a (slightly) lower rate than one of 25. This may or may not be negligible.

Moreover, as pointed out in Richie's blog by some other people and Richie himself, two need to play this game, so only considering the female MFR is just one side of the story and does not account for interaction and possible problems on either sides.

From my point of view, the strongest limitation of the model is that point estimates are considered, with no account of uncertainty whatsoever. As we saw in the telomere paper, there is a very large variability in the success rates, even within age groups. Thus, for a woman aged 25 the rate may vary between .05 and .4 (I'm making these numbers up, but I think the range could be reasonable). This will obviously have implications in the overall estimation of the success probability.

Off the top of my head, informative mixture-priors could be applied to account for underlying sub-populations within the overall population of couples (for example the case of a woman with one blocked fallopian tube, or similar cases).

Nice post, though $-$ the kind of things we really like!

Under the Tuscan sun

I have to admit I've never read the book, but I remember watching the ads for what I thought was an awful, full of cheap stereotypes movie when I was in Boston a few year ago. Our week under the Tuscan sun has been been a good week, though.

The weather has been great, in fact even too hot for my recently converted English-like taste. We've had a good mixture of time on the beach

(we found a nice little spot with not many people, which was nice, but I still managed to get mildly sun-burnt), and in the gorgeous small towns on the hills just right by the sea

which was actually the bit I liked the most, by a fair distance. There is a place in Castagneto Carducci (which I now consider as the ideal town for a holiday: just close enough to the beach, but not too close that you "have" to go everyday for most of the day) where the pizza is so big that you really struggle to finish it. But because it's also sooo good, you really have to give it a serious go.

Yesterday we wandered around Florence like American tourists, shopping for shoes and ice cream (incidentally: if there are two people, one young-ish, handsome man and one incredibly attractive, if a bit bossy young woman, who is doing all the shopping?), with the advantage of knowing the back roads to the main sights and going through all the narrow, emptier little streets of the centre. It is funny to realise how I now look at my home town in such a different way than when I was living here, and I kind of wish I had the tourist attitude more, back then.

In a couple of hours we'll get a train to Bologna and fly back home from there. Hopefully, there'll be no drama on the train; given that the incredibly attractive young woman above is fuelled by recently watching several episodes of Game of Thrones, I'm not sure of what could happen...

Gianluca Baio's blog