Economy Class: April 2021

An edited abstract of our book was published on Sunday and caused statisticians everywhere to lose their minds. One particular tweet has gone viral.

The issue was due to the headline implying that Bayes' theorem was an "obscure math theorem". The reason it produced so much heat is the phrase is debatable in meaning, like "the dress".

What exactly we infer from this phrase is coincidently, also related to Bayes' theorem. Does it imply that it is obscure for a math theorem? Or that math theorems are not very well known anyway? Or perhaps "obscure" means hard to understand*?

Baye's theorem is certainly well known for a theorem, it is taught in many introductions to stats classes. But then again, what percentage of the population would have taken stats classes, not a huge amount? What percentage of the population is low enough to qualify for something to be called "obscure" anyway?

Perhaps you think the meaning is clear so let's change it around. Let's say the headline was Bayes' theorem is a "well-known math theorem". Do you honestly think no one would reply: "Well-known! I haven't heard of it before!!!"

These phrases are lexically ambiguous and it's something I have written about before. If you hear someone say that the "priest married my sister" you may be unsure if the priest was the person who conducted the ceremony or is your sister's husband/wife. Usually, you can work this out from the context (e.g. if the priest was catholic then it's extremely unlikely to be the latter). This is how Bayesian reasoning works, we update our beliefs given new information and we do it all the time without thinking.

However, sometimes this reasoning can go wrong and lead to a lot of issues like the above. If you know that Bayes' theorem is well known for a theorem, it may affect how you think most people would interpret the phrase "obscure maths theorem". Knowing this information biases you: it makes it much harder to think how others (who do not know this information) would think about the phrase.

This is also related to something called the curse of knowledge (another thing I have written about before). This happens in conversation when you mistakingly believe that they know what you know. For example, some people pointed out that P(A|B) = (P(B|A)P(A))/P(B) is really simple as it's just multiplication and division. But unless you know that the "|" means conditional on and the "P" means probability, it's just completely incomprehensible. It isn't simple unless you know what the symbols mean, it is like saying this is simple, これは簡単です.**

The irony of all this is that journalists (including Tom) often get annoyed at readers who think that they write the headline of an article. Journalists know that it is usually the editor that writes the headline as they have direct experience of this. However, how common is this knowledge amongst the general population?

Saying this, I can also understand people's anxiety that the headline may be misleading. I do think it highlights a broader issue with the disconnect between headline writers and those who write the article.

Overall though, the fact that the Observer wanted to commission a piece about Bayes' theorem in the first place is fantastic. Let's build positively on this rather than arguing about it into obscurity.

*Some people were concerned that the phrasing of the headline made it sound like it was more difficult to understand which could put people off. Another way of looking at is that for many, maths seems difficult to understand and acknowledging this may make people feel more confident. I have no idea which way is the right way to look at this.

**This is the Japanese for "this is simple".

Everyone has been accused of being patronising at some point in their lives. Patronising, by the way, is when you...

Ok, so this is a terrible joke but it does point to a problem: when do you explain something to someone?

The curse of knowledge happens in conversations when you mistakingly believe that they know what you know. It is more likely to happen in situations when jargon is involved but it is not always so easy to spot when you are doing it.

The curse of knowledge, however, is always present. It is happening right now as I am this writing this. I am making an assumption about your knowledge. To do this, I have a likely reader in mind, someone who I think knows the definition of patronising and can read English, for example. The problem is out of the (many millions) of people who read my blog, it is likely that someone will not understand the definition of patronising. What is even more likely, is that I have a slightly different definition to you.

In any conversation, the chances of a misunderstanding in this way are a result of two things: me not explaining something and you not asking me to explain what I mean. The former can happen because I assume you know how I am defining something or I don't want to appear patronising. The latter happens because you are afraid to ask and don't want to look stupid.

Alternatively - and the one I believe is the cause of most misunderstandings - you have a different definition to me and we both assume our definitions are the same. This happens all the time with debates about "capitalism" or "socialism" and is particularly pernicious with misleading words.

So why do we get annoyed by someone explaining something to us that we already know? What we are accusing them of, is thinking it is highly likely we will know.

The person explaining, however, could in fact think it is quite likely that you do know, but they just want to make sure. This all implies though that there is some level of certainty of the other person's knowledge above which we won't bother explaining e.g. I won't explain X if I am 80% sure the other person knows. Of course, we don't actually think in probabilities and this tolerance level will change depending on the situation. I would want to be 99.99% sure the other person knows which colour wire to cut if I were talking someone through a bomb defusal, even if afterwards the person accuses me of being patronising (it's a cross I am willing to bear).

This level is quite important because the higher we set it, the more likely we are to make the mistake of explaining to someone something that they already know. The flip side is that it becomes less likely that we do not explain something and the person ends up not knowing. This is akin to false positives and false negatives from hypothesis testing in statistics. What level we set may be arbitrary but it has real trade-offs: decreasing the chances of false positives increase the chances of false negatives and vice versa. It is also a big part of how science works and something I think people should know more about.*

But what about situations where someone just wants to helpfully explain and make fewer false positives. If the person gets annoyed at you explaining it to them, is there annoyance really justified? If you are not being actively condescending by rubbing it in saying they "should really know the answer what", what's the harm?

Well I think there is some harm caused by this. Invariably you will make judgement calls about what to explain and when, you can't explain everything to everyone (it would take forever) and a lot of this reasoning is subconscious. Consider, for example, mansplaining. Men may think that they do not treat men and women differently when it comes to explaining things, but it is extremely difficult for a person to know for sure.

I don't think there is a simple solution to this. However, I do think rather than explaining the "correct" definition per se, it is probably better to offer personal definitions. For example, saying "my understanding" of something is very different from saying "this is what X means". At the same time, we should be politely asking people how they are defining something more often. Hopefully, this will let us avoid the curse of knowledge without being overly patronising.

*You may have heard of p=0.05 before which is often the arbitrary level set in hypothesis testing. There is nothing special about p=0.05, we can set it lower at p=0.01 and we would get fewer false-positive and more false negatives. If we set it higher, say at p =0.10, the opposite occurs. But it is not just a simple probability and has quite a specific meaning which is often misinterpreted. We explain hypothesis testing and why it often goes wrong in our new book.

Economy Class

Using Bayes' theorem to figure out whether Bayes' theorem is an "obscure math theorem".

The curse of knowledge: are you being "helpful" or just patronising?