Thesis off-cuts: the reproducibility crisis in cosmology

As we frequently hear1, we’re now in the precision era of cosmology. What this really means is that we’re in the era of measuring things really well, and we’re getting really good at measuring things because we keep building ever more enormous and powerful telescopes. I remember attending the STFC Introductory Summer School on Astronomy a few months before I started my PhD, where I was astonished to learn that the Square Kilometre Array, or SKA, is going to produce a staggering one petabyte of data every night once fully operational (though perhaps this estimate has been revised since). How on earth are we meant to deal with that? What repercussions does generating such a vast volume of data have for the field of cosmology?

In the course of my PhD, I made extensive use of Markov chain Monte Carlo (MCMC) methods for parameter inference. What this means is approximating a probability distribution of a parameter of interest (in lay terms, the value of a parameter and its error bars, according to some data) by repeated random sampling. We rely on the notion that from a set of samples we can safely infer the properties of the full distribution.

I used this method to study dark energy models2. For example, I could come up with a model that is described by certain parameters, say A and B. When A = 0 and B = 1, my model reduces to the standard dark energy model, the cosmological constant Λ. It’s no good just guessing whether A = 0 and B = 1: it’s better to use some observational data to try and find out their true values.

This is where MCMC sampling comes in. We make use of Bayes’ theorem (derived by Rev. Thomas Bayes in the mid-1700s but largely unused in statistics until the mid-1900s), which states that the posterior distribution (the distribution we’re interested in finding) is the product of the likelihood (what comes from the data) and the prior (our prior knowledge; for example, I might know my model only works as dark energy if A is positive — I can impose this knowledge through the prior), normalised by the evidence (something that is not usually considered in an MCMC analysis).

So far, so good. I will implement my dark energy model in my favourite Boltzmann solver (a code which will compute all the values of the cosmological parameters in my model), link that with an MCMC sampler (which will compute the posterior distributions of the parameters given the data and priors) and submit my job to the cluster queue.

The problem is what happens next: the twiddling of thumbs and the checking of convergences that can last for days or weeks on end. This is because MCMC analyses of the type I’ve described take a really long time, even on a high performance computing cluster. To be frank, I wouldn’t wish running an MCMC analysis of an interacting dark energy model (especially those with lots of parameters which are unconstrained by the data) on my worst enemy.

More importantly than the fact that it made the first two years of my PhD extremely boring, the time it takes to run an MCMC analysis reveals a more troubling issue. Scientific experiments are supposed to be reproducible; that’s one of the basic principles of the scientific method. Reproducibility is harmed when either the length of time taken to run an experiment or the materials or resources needed to carry it out are not widely available to others. It’s my belief therefore that an over-reliance on slow computational tools and methods is not good for the general well-being of the field3.

What can be done, then? I’d personally like to devote more time to pen and paper analyses of dark energy models, to better understand their general behaviour, rather than plugging them straight into an MCMC sampler and cranking the handle4. I am only coming to this realisation now as I have suffered from a lack of mathematical confidence for some years5 which has consequently meant I’ve done my level best to avoid doing calculations by hand during my PhD. I’m far more comfortable writing code, but, as I realised when trying to justify the methodology I followed in my thesis, my comfort zone should not take precedence over best scientific practice.

Another promising line to go down would be to work on MCMC methods themselves. Dark energy models of the type I’ve studied can produce degeneracies in parameter space, or have parameters which are very unconstrained. Both of these outcomes contribute to the slow convergence (or even failure) of MCMC sampling. Perhaps, then, we should try to move the field away from MCMC (or at least, Metropolis–Hastings MCMC) and towards something more suited to tackling difficult distributions, such as nested sampling (though in my experience, that can also be very slow), or ensemble slice sampling6.

And how about the role of large collaborations in all this? I’ve already mentioned the huge amount of data the SKA is going to collect. Other projects that are on the horizon, such as JWST and Euclid, will do the same. Something I want to learn more about is how these collaborations decide on how much of their data is made public, and when. Do collaboration members have exclusive, or preferential access? How does this in turn affect reproducibility? And what efforts are made to ensure that researchers from developing countries (whose governments are unlikely to be able to buy into the experiments) are able to make use of the data too? This particular question of widening access is of course extremely difficult to address, as dealing with economics naturally takes us into a very politicised realm.

The food for thought, then: in the future I want to think more carefully about the tools I’m applying to a problem, and consider if what I am doing is easily replicable by another (and if I have explained it well enough in the paper for another to understand and follow!). I also want to gain a deeper understanding of the tools themselves, to see if they can be improved in any significant way. So far in my career I’ve managed to steer well clear of collaboration policies and politics, but I would certainly like to learn more about how these consortia manage and distribute their data.


  1. Or at least, as I frequently write in the conclusions of my papers.
  2. I’ve written more about my work on alternative dark energy models in a couple of other posts: https://nataliebhogg.com/2019/04/13/constraints-on-the-interacting-vacuum-geodesic-cdm-scenario/ and https://nataliebhogg.com/2020/02/25/paper-day-madrid-week-seven/ and see here for a layperson’s introduction to dark energy: https://nataliebhogg.com/2020/10/18/coincidence-problem-anthropic-principle/. I’ve never written about MCMC or Bayesian statistics in detail — a post for another day.
  3. And the answer is not to simply throw more computing power at the problem! I’ve written a bit about this before (https://nataliebhogg.com/2020/09/29/tribalism-in-science/), but in terms of carbon emissions, reliance on high performance computing is not at all good for the environment.
  4. There’s a reason that turning a crank was used a punishment in Victorian prisons: it’s mind-numbingly dull, and patently useless (it’s also where the slang term “screw”, meaning prison officer, comes from).
  5. Perhaps brought on by my A Level maths teacher who one day told me that I’d never get higher than a B in her subject. As it turns out, she was exactly right, but I’m hoping my PhD in cosmology will be sufficient compensation.
  6. See here: https://arxiv.org/abs/2002.06212 and here: https://github.com/minaskar/zeus.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s