Bayes, Stereotyping, and Rare Events

26 July 2013, 0227 EDT

Sadly, many people do not realize that even if the majority of those who engage in behavior X belong to category Y, that does not mean that the majority of people in category Y engage in X.  This point is often made, rightly, with respect to race and violent crime and religion and terror.  But most treatments I’ve seen either imply that anyone who doesn’t understand is a moron, or manage to scare away the target audience by throwing in a pile of math without explaining it.  In this post, I’ll try to actually explain why we can’t conclude that most members of Y are prone to acts of X even if most acts of X are committed by members of Y.  This post won’t insult anyone for being unfamiliar with Bayes’ Theorem, nor will you find much algebra herein.  I’m just going to try to explain, with a relative minimum of technical detail, why we can’t assume that most members of Y engage in behavior X just because most people who engage in X are members of Y.

Let’s start with a simple example, one free of political baggage.

Suppose that some drug test is known to yield positives 98% of the time when the subject is in fact a user of the drug in question, and negatives 98% of the time when they are not.  That is, the risk of a false positive is quite low, at only 2%.  (As is the risk of a false negative, though we’re not concerned about that here.)  What is the probability that an individual who tests positive is in fact a user of the drug?

That’s a trick question.  We can’t answer it without knowing the baseline rate of usage in the population.  And therein lies the rub — if the drug is commonly used, then your gut intuition probably isn’t too far off the mark.  But if it’s not, you may be surprised by the answer.

The following graph plots the conditional probability that an individual who tested positive for use of the drug is in fact a drug user as a function of the baseline rate of usage in the population, calculated via Bayes’ Theorem.

bayes

 

If at least 15% of the population uses the drug, the probability that an individual who tests positive in fact uses the drug will be over 90%.  Thus, if you were to erroneously conflate the likelihood of a true positive with the actual probability that the person is a user of the drug, that wouldn’t be too egregious a mistake.  But if the baseline rate of usage is only 1%, a positive result only implies a 33% chance that the person uses the drug.  And that’s with a very accurate test!  (You can rest assured that the methods used by the FBI and the NSA to sniff out terrorists leave room for more than a 2% chance of a false positive.)

How can that be?  Suppose we randomly grab 100 people off the streets and force them all to take our drug test (which would be totally feasible, ethical, and legal).  Statistically, 1 of them is going to be a user, and they’ll almost certainly test positive.  But odds are two other people will as well, because the test only gives us the right answer 98% of the time.  So we’ll have three positive results, but only one drug user.  (That was’t so bad, was it?  Anyway, I promise those are the last numbers you’ll see in this post.)

What does this tell us?  Put simply, the probability that you’d be mistaken to assume that someone who belongs to group Y is likely to commit or have committed act X simply because most such acts are committed by members of group Y grows exponentially higher as X becomes rarer.  The reason you should not assume that a person is a terrorist just because they’re Muslim, then, is not just that this is politically incorrect and likely to offend delicate liberal sensibilities.  It’s that it’s almost certainly incorrect, full stop.

The following Venn diagram illustrates the point (please forgive my utter lack of artistic talent).

venn The big circle on the left represents adherents of Islam.  The one on the right represents Christianity.  (For ease of exposition, I’m ignoring all other faiths, as well as variation within these categories.  I’ve also made the circles the same size, implying that there are equal numbers of Christians and Muslims, whereas there are considerably more Christians in the world than Muslims.)

The small circle in the middle represents those who have conducted, or are actively planning to conduct, acts of terrorism.  For the sake of argument, I’ve assumed that the majority of terrorist acts are committed by Muslims (hence the better part of the circle appearing in light green), while only a small fraction are committed by Christians (and a tiny sliver of the middle circle light yellow).  (If you’re having trouble with the notion of Christian terrorism, feel free to assume that the majority of these are committed by people who just so happen to be Christian but who were in fact motivated to kill by political concerns, such as separatism or nationalism or radical environmentalism or whatever.  And if you don’t think the same can be said of terrorist attacks committed by people who just so happen to be Muslim, fine.  The point I’m trying to make doesn’t depend on any particular assumptions about whether terrorist attacks committed by people of a certain religion are caused by their religious beliefs.)

What the diagram shows is a hypothetical world in which most people have never committed, and have no intention of committing, terrorist attacks.  The probability that any given person is a terrorist is negligible, regardless of their faith, even though we have assumed that most terrorist attacks are committed by Muslims.

Of course, this diagram does not itself tell us anything about the likelihood that any given Muslim is planning a terrorist attack.  It merely illustrates the point that it’s possible for most terrorists to be Muslim without most Muslims being terrorists.  So it’s worth noting that various surveys indicate that the average Muslim is less likely than the average American or average Israeli to view violence against civilians as justifiable.

If you’re having trouble reconciling those survey results with the fact that most acts referred to as terrorists attacks by Western media are perpetrated by Muslims, you might want to go back and review the drug test example.  And be thankful that terror attacks are rare enough for the risk of invalid inference to be so high.