“Statistics is the New Grammar”

21 April 2010, 1930 EDT

[Cross-posted at Signal/Noise]

In the latest issue of WIRED, Clive Thompson pens a great piece which echoes a sentiment I’ve touched on before: in a data-driven world it is critical that all citizens have at least a basic literacy in statistics (really, research methodology broadly, but I’ll take what I can get).

Now and in the future, we will have unprecedented access to voluminous amounts of data. The analysis of this data and the conclusions drawn from it will have a major impact on public policy, business, and personal decisions. The net effect of this could go either way–it can usher in a period of unprecedented efficiency, novelty, and positive decision making or it can precipitate deleterious actions. Data does not speak for itself. How we analyze and interpret that data matters a great deal, which puts a premium on statistical literacy for everyone–not just PhDs and policy wonks.

Thompson notes a number of statistical fallacies that many, including members of the media, fall prey to. Using a single event to prove or disprove a general property or trend is one spectacular one that we see all the time, particularly with large, macro-level events. Regardless of what side of the climate change debate you are on a single snow storm or record-breaking heat wave does not rise to the level of hypothesis-nullifying or -verifying evidence.

There are oodles of other examples of how our inability to grasp statistics–and the mother of it all, probability–makes us believe stupid things. Gamblers think their number is more likely to come up this time because it didn’t come up last time. Political polls are touted by the media even when their samples are laughably skewed.

Take correlation and causation. The cartoon below nicely illustrates the common fallacy that the correlation of two events is enough to prove that one causes the other:

In thinking about this I remembered an argument I had with a number of colleagues while in grad school over why they had to be at least somewhat literate in quantitative analysis and game theory since they never intended to use such methods. Given that we will only see an increase of data and data-based (no pun intended) arguments, policies, and decisions we need to, at a minimum, be able to understand how the results were achieved and whether or not the studies are flawed. Patrick is probably the last person to apply quantitative methods to social scientific problems, but he can certainly speak the language with the best of them.

Bottom line: the importance of statistical literacy will only increase. Statistics will come to permeate our lives, more so than ever before. We had better be able to speak the language.