Friday, 24 May 2013

Simpson gets married (and divorced?)

While I was waiting for my coffee this morning, I flipped through the newspapers on one of the tables in my local coffee place when my eye got caught by this article in The Times (I think to see the full article you need a subscription $-$ which I don't have on principle grounds. But you can access the actual report even if you don't subscribe to Murdoch's flagship newspaper).

The news is that "Most children from broken homes had unwed parents" and the evidence comes from a recent report published by The Marriage Foundation

The report, based on data from Understanding Society (US), the ESRC longitudinal survey of British households, says that:
1) 45% of young teenagers (13-15 years old) are not living with both parents;
2) Half of all family breakdown takes place during the first two years;
3) Amongst parents who remain intact, 93% are married.

Now: I can see that there might be something going on and that it may be more likely that married couple remain intact longer (although I think one should bare in mind that, as reported here, "Based on marriage, divorce and mortality statistics for 2010, it is estimated that the percentage of marriages ending in divorce [in the UK] is 42%").

But surely this average 93% figure will possibly suffer strongly from Simpson's paradox, ie it is highly sensitive to conditioning on some other covariates (which are not considered here $-$ NB: that's not to say that US doesn't account for them; just that the report didn't bother considering them).

For example, kids who happen to become pregnant when they are 16 and are not in a stable relationship may well be at a high risk of breaking up within two years of the child being born. And probably, if they do get married as a result of the unwanted pregnancy, they may contribute highly to the proportion of divorces. But this doesn't necessarily apply to thirty-something who have been in a stable relationship for a few years before deciding to have kids, while for some reason not getting married! Because these various categories are likely to not be uniformly distributed in the underlying population, one should weight for their relative frequency when giving a population summary.

Another interesting passage from the report says: "... provides further evidence that the trend away from marriage is the driving force behind family breakdown. Out of 47% of children born to unmarried parents today, the model predicts that just 11% will reach their 16th birthday with both parents intact and unmarried. The rest will either marry or split up".

So, it seems to me, the evidence provided is rather quite weak: what if all of the rest actually got married? Should we then blame marriage for breaking up those families? (Of course, that's just an extreme case, which is unlikely to happen. But the report conveniently fails to give the distribution of "the rest" and, it seems to me, makes some unwarranted inference from the data).

Thursday, 23 May 2013

Bayes Pharma 2013 (4)

The conference is officially over and the breaking news is that I have managed to succeed where the lot in the picture right next to this miserably failed. Somehow they lost the bid to bring the world cup to London (beaten by Qatar!).

[Royal] We, on the other hand, won by a landslide the right to organise the next Bayes Pharma! $-$ of course, I was actually ambushed by Emmanuel and Bruno who very cleverly pointed out that it would be nice to take the conference to London, if only someone was in the scientific committee that could organise it...

I blame Julien for all this! ;-)

Wednesday, 22 May 2013

Bayes Pharma 2013 (3)

Another very interesting day. The highlight of the morning was, in my opinion, David Ohlssen's talk. David is with Novartis US and has discussed a few issues related with subgroup analysis, pointing out the potential strengths of applying a Bayesian approach.

In the afternoon, a very good session on missing data. Nicky and Alexina's talks were very good and with plenty of interesting insights $-$ I'd like to incorporate some of them in the lectures for Social Statistics next year (although usually the students are not familiar with Bayesian statistics, so that is probably a bit complex to do). I think all the talks will be soon available on the Bayes Pharma website.

Finally, following the advice of a very nice local young lady, we went to dinner to a sort of North African/Middle Eastern restaurant, which was really nice.

Tuesday, 21 May 2013

Bayes Pharma 2013 (2)

The first day of the conference was quite good, I thought. I was pleased with the audience's response to my talk and also the other talks were quite good. The level is even higher than last year.

There are about 75 people attending (which is 15 more in comparison with last year) and considering this is only the 4th edition of the conference, I think that's quite a big success, and we're happy about it!

Also, today was dedicated to the social events. We had dinner in a nice restaurant but before that, to work up an appetite, we burnt some calories doing a bike tour of the city (to be precise: 288 calories, although I actually forgot to start the tracker on my phone at the beginning of the tour, so I'd say it was more like 350)

That too was quite nice; and quite typical as well, including the weather conditions perfectly in keeping with the tradition of the Ardennes classics...

Monday, 20 May 2013

Bayes Pharma 2013 (1)

Earlier today, I've arrived in Rotterdam for the Bayes Pharma conference. As I already said in a previous post, I think we have quite an exciting line up. In fact, I think that the finalised programme is packed with interesting talks!

My first impressions of the conference (well, the conference hasn't started yet, but I mean the whole "experience") are:
1. I should really stop arriving at the airport 2 minutes before they close the gate. And it's not like I don't try $-$ no matter how early I leave, I always manage to nearly miss my flight.
2. Schipol is the best airport ever. It's really easy to navigate and well organised.
3. Dutch people are, for some reason, much more tanned than you'd expect them to be. Surely this has nothing to do with the weather (which is crappier than in London).
4. Rotterdam looks nicer than I remembered. I came here for another conference back in 2002 and I wasn't really impressed. But today (despite the incessant rain) I got a much better vibe. Before the conference dinner, we'll do a bike tour and now I'm really into it.

Monday, 13 May 2013

Handouts & Beamer

While preparing the slides for my talk at Bayes2013 next week, I stumbled on a slight LaTeX problem. Creating the talk with overlays is easy and the results are really good. But often you need to create handouts, in which case overlays are not too helpful.

This is not new (although I just found out, so that's new to me and I thought I shared it), but you can sort this out by using
\documentclass[10pt,handout]{beamer}
(the option handout effectively removes all the overlays).

\usepackage{handoutWithNotes,pgf,pgfpages}
\pgfpagesuselayout{4 on 1 with notes}[a4paper,border shrink=5mm]
formats the handouts so that 4 slides are showed on a single A4 paper, with some space for notes to the right.

This works OK to remove the effect of the \pause command. But if overlays are created using, for example, the command \only, then the above method doesn't really work with the handouts. The problem can be fixed by adding a specific option for the handouts, something like
\only<1|handout:1>{What's to be showed only on the first overlay}

Some more discussion on this issue, here.

Tuesday, 7 May 2013

Alternative diseases

Yesterday we took XY for his first gelato in Richmond (to be fair, he'd already tried some ice cream, but that wasn't quite the same experience...). In the last couple of years, the place has become very popular (and a bit overpriced), so there was a big queue outside the shop. As we were waiting to get in, Marta quickly glanced at the alternative medicine shop in front of the gelateria and as quickly demanded my attention.

And then we spent the next 10 minutes wondering how they would treat the last in the list of the men's problems.

Tuesday, 30 April 2013

What the BBC isn't telling you

Yesterday Gareth pointed me to this article on the BBC website. The underlying story has to do with Meredith Kercher's murder and the subsequent trial involving mainly her flat-mate Amanda Knox, in Perugia (Italy).

As often in these gruesome cases, the evidence is pretty difficult to handle (eg because of contamination). Thus, even in the era of CSI, it is extremely difficult to prove guilt beyond reasonable doubt based on DNA evidence.

In Amanda Knox's trial, the appeal court decided to dismiss DNA evidence altogether, on the grounds that the available sample (which was used to convict Knox and her alleged accomplice Sollecito, in the first appeal) was unreliable. The appeal judge then ruled that it was pointless to produce further samples.

But the BBC article reports a contribution by Coralie Colmez, who points out some inconsistencies in the use of the available evidence. Her main point is that "the mathematics tells you that if you have an unreliable test, you do it again and you can be a lot more certain of that answer. And it's not intuitive but a simple calculation will tell you that it's true".

I suspect that her contribution may have been cut (too deeply) to fit the BBC article; but, it seems to me that in doing so the reported has not done a good job. The article says: "You do a first test and obtain nine heads and one tail... The probability that the coin is fair given this outcome is about 8%, [and the probability] that it is biased, about 92%. Pretty convincing, but not enough to convict your coin of being biased beyond a reasonable doubt, Colmez says".

Now, what the BBC is not telling you is that there is a statistical model (and in fact a relatively complex one) behind this conclusion. I think this is what is going on in the background. Assume that $y=9$ heads are observed out of $n=10$ trials. We can reasonably model this as $y \mid \pi,n \sim \mbox{Binomial}(\pi,n)$, where $\pi$ is the probability that the coin gives a head. Also, assume that there are two possible data generating processes at play. The first one ($D=1$) is such that $\pi=0.5$ with probability $1$, which implies that the coin is unbiased. In the second one ($D=2$) we assume no particular knowledge about the true chance of observing a head, and thus model $\pi \sim \mbox{Uniform}(0,1)$.

Effectively, we are implying a mixture prior

$$p(\pi) = w_1\times \pi_1 + w_2\times\pi_2$$
where $w_1=\Pr(D=1)$, $w_2=(1-w_1)=\Pr(D=2)$, $\pi_1=0.5$, $\pi_2 \sim \mbox{Uniform}(0,1)$
and the objective is to produce an estimation of the probability that the coin is biased, given the observed evidence: $\mbox{E}(\pi\neq 0.5 \mid y,n)$, or to be more precise $\Pr(D=2 \mid y,n)$.

Of course, there is nothing special about these two "hypotheses" and one can even argue that they are not necessarily the best possibilities! But, let's assume that these are OK. You can use simple MCMC to obtain this (I think there is a very similar code in The BUGS Book, dealing with pretty much the same example). Here's how I've coded it

model {
y ~ dbin(pi[D],n)
D ~ dcat(w[])
pi[1] <- 0.5
pi[2] ~ dunif(0,1)
p.biased <- D - 1
}

Assuming you save this to a file BBCcoin.txt, and that you have R, JAGS and installed in your computer, you can run this model directly from R, using the following code.

library(R2jags)
working.dir <- paste(getwd(),"/",sep="")
# Observed data
y <- 9 # number of heads

n <- 10 # number of trials
# Prior for the data generating process
w <- numeric()
w[1] <- .5 # probability that coin is unbiased
w[2] <- 1-w[1] # probability that coin is biased
filein <- "BBCcoin.txt"
dataJags <- list(y=y,n=n,w=w)
params <- c("p.biased","pi")
inits <- function(){
list(pi=c(NA,runif(1)),D=1+rbinom(1,1,.5))
}
n.iter <- 100000
n.burnin <- 9500
n.thin <- floor((n.iter-n.burnin)/500)
tic <- proc.time()
coin <- jags(dataJags, inits, params, model.file=filein,n.chains=2, n.iter, n.burnin, n.thin,
DIC=TRUE, working.directory=working.dir, progress.bar="text")
print(coin,digits=3,intervals=c(0.025, 0.975))

This gives the following results

Inference for Bugs model at "BBCcoin.txt", fit using jags,
2 chains, each with 1e+05 iterations (first 9500 discarded),
n.thin = 181 n.sims = 1000 iterations saved
mu.vect sd.vect  2.5% 97.5%  Rhat n.eff
p.biased   0.916   0.278 0.000 1.000 1.002  1000
pi[1]      0.500   0.000 0.500 0.500 1.000     1
pi[2]      0.802   0.160 0.308 0.974 1.002  1000
deviance   3.386   2.148 1.898 9.258 1.000  1000

For each parameter, n.eff is a crude measure of
effective sample size, and Rhat is the
potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 2.3 and DIC = 5.7
DIC is an estimate of expected predictive error
(lower deviance is better).

which is consistent with the article claim that the probability of bias is about 92% (or equivalently that the posterior probability that $D=2$ is the "true" data generating process is about 8%).

So quite some model details omitted, that are not not so obvious, I think!