Economy Class: Rankings, Goodhart's law and the REF

There was an old lady who swallowed a fly

I don't know why she swallowed a fly - perhaps she'll die!

I really dislike rankings. I feel they are a way to make subjective data seem objective and hide a lot of information that we care about. There is a whole chapter dedicated to why they are bad in my book.

One such ranking, the Research Excellence Framework AKA REF2021, has just been released (it was meant to come out last year but got delayed because of the pandemic, hence 2021). It basically assesses the “quality” of university research. To do this, academic panels assess research papers submitted and are awarded a score of 4* (quality that is “world-leading”) to 1* (quality that is “recognised nationally”). Quite what this means is anyone’s guess (especially as the papers being assessed have already been published in academic journals who already have a sort of quality ranking). But once you have done all that you can calculate a mean score for each subject across universities.

Quite often the results cluster around 4* and 3*, so the mean average you get usually sit around the decimal places, 3.12, 3.5 and so on. And as some departments are so small, a marginal decision (say someone judging a paper as a 4 rather than a 3) can affect your average by a few decimal points. This might not seem like much, but when it comes to rankings where everyone average is basically the same, you can easily jump up 10 places. I think due to shear amount of work involved in this, people are probably quite reluctant to admit that a lot of the variation in these rankings are random.

But I think the biggest issue with the REF is that it is a perfect example of Goodhart’s Law (another chapter in my book). It states, that when a measure becomes a target, it ceases to be a good measure. And universities certainly try and target the REF - it is government policy and in the job description of all VCs across the land.

What this means, is that there is a large incentive to “game” the system, to try and get as highest REF score as possible. How can you game the system? Well you could potentially not include some individuals who you don’t think will score so high, or perhaps hire a few big hitters on a temporary basis. All of these things were noted when the last REF was done in 2014, so how did we respond? Well, we created more rules.

Football is a simple game, two teams run around for a bit and try and put the ball in the oppositions net. But when the game was first invented, the problem of goal hanging made for poor games. So the offside rule came into being. In 1863, a player was considered offside if there were 3 of the opposing players in front of him. In 1925 this evolved to two players and in the 1990s this changed again, to being level with the 2^nd to last player. There have been multiple changes since and the reason is that each time a new rule is made, players try and find a way to game it. Whether it is the offside trap or interfering with the goal keepers view, no matter what rule you create people will have an incentive to try and game it.

To that end we now have VAR to enforce the offside rule. But to many people, VAR really takes the joy out of a last minute winner. Which is kind of ironic as the whole point of introducing the offside rule back in 1863, was to make the game more enjoyable to watch!

This is perhaps the biggest problem with the REF. No matter what rules we impose, if we incentives trying to do well in it, people will try and game it. It is like the old lady who swallowed a fly. Rather than taking the L she decided that in order to fix this problem she would swallow a spider, which created a further problem because now she has to find something that will deal with the spider - so she swallows a bird. You end up getting so lost in trying to fix problems that you lose sight of the original problem you were trying to fix. So what is the REF for anyway?

Society doesn’t particularly care about relative rankings of universities or the REF per se. What it does care about, if at all, is getting universities to produce research like creating new vaccines, understanding the universe and whatever I am currently working on. If university increased its research "quality" by 1000% it would have a much better outcome for society (although you wouldn't be able to tell from looking at rankings, another reason why they suck).

What the REF has certainly done, is increase the amount of research output by incentivising it. But does that result in scientific progress? I am very sceptical that focusing on research outputs is a good way of going about it. Spending a lot of time and recourses to create something that isn't a particularly good measure of research quality doesn't seem like a good deal to me.

Economy Class

Thursday, 12 May 2022

Rankings, Goodhart's law and the REF

No comments:

Post a Comment

The UK smoking ban: can paternalism be justified?