Challenges to Qualitative Research in the Age Of Big Data

by PM

17 February 2012, 1322 EST

Technically, “because I didn’t have observational data.”
Working with experimental data requires only
calculating means and reading a table. Also, this
may be the most condescending comic strip
about statistics ever produced.

The excellent Silbey at the Edge of the American West is stunned by the torrents of data that future historians will be able to deal with. He predicts that the petabytes of data being captured by government organizations such as the Air Force will be a major boon for historians of the future —

(and I can’t be the only person who says “Of the future!” in a sort of breathless “better-living-through-chemistry” voice)

 — but also predicts that this torrent of data means that it will take vastly longer for historians to sort through the historical record.

He is wrong. It means precisely the opposite. It means that history is on the verge of becoming a quantified academic discipline. That is due to two reasons. The first is that statistics is, very literally, the art of discerning patterns within data. The second is that the history that academics practice in the coming age of Big Data will not be the same discipline that contemporary historians are creating.

The sensations Silbey is feeling have already been captured by an earlier historian, Henry Adams, who wrote of his visit to the Great Exposition of Paris:

He [Adams] cared little about his experiments and less about his statesmen, who seemed to him quite as ignorant as himself and, as a rule, no more honest; but he insisted on a relation of sequence. And if he could not reach it by one method, he would try as many methods as science knew. Satisfied that the sequence of men led to nothing and that the sequence of their society could lead no further, while the mere sequence of time was artificial, and the sequence of thought was chaos, he turned at last to the sequence of force; and thus it happened that, after ten years’ pursuit, he found himself lying in the Gallery of Machines at the Great Exposition of 1900, his historical neck broken by the sudden irruption of forces totally new.

Because it is strictly impossible for the human brain to cope with large amounts of data, this implies that in the age of big data we will have to turn to the tools we’ve devised to solve exactly that problem. And those tools are statistics.

It will not be human brains that directly run through each of the petabytes of data the US Air Force collects. It will be statistical software routines. And the historical record that the modal historian of the future confronts will be one that is mediated by statistical distributions, simply because such distributions will allow historians to confront the data that appears in vast torrents with tools that are appropriate to that problem.

Onset of menarche plotted against years for Norway.
In all seriousness, this is the sort of data that should
be analyzed by historians but which many are content
to abandon to the economists by default. Yet learning
how to analyze demographic data is not all that hard,
and the returns are immense. And no amount of
reading documents, without quantifying them,
 could produce this sort of information.

This will, in one sense, be a real gift to scholarship. Although I’m not an expert in Hitler historiography, for instance, I would place a very real bet with the universe that the statistical analysis in King et al. (2008) , “Ordinary Economic Voting Behavior in the Extraordinary Election of Adolf Hitler,” tells us something very real and important about why Hitler came to power that simply cannot be deduced from the documentary record alone. The same could be said for an example closer to (my) home, Chay and Munshi (2011), “Slavery’s Legacy: Black Mobilization in the Antebellum South,” which identifies previously unexplored channels for how variations in slavery affected the post-war ability of blacks to mobilize politically.

In a certain sense, then, what I’m describing is a return of one facet of the Annales school on steroids. You want an exploration of the daily rhythms of life? Then you want quantification. Plain and simple.

By this point, most readers of the Duck have probably reached the limits of their tolerance for such statistical imperialism. And since I am a member in good standing of the qualitative and multi-method research section of APSA (which I know is probably not much better for many Duck readers!), who has, moreover, just returned from spending weeks looking in archives, let me say that I do not think that the elimination of narrativist approaches is desirable or possible. Principally, without qualitative knowledge, quantitative approaches are hopelessly naive. Second, there are some problems that can only practically be investigated with qualitative data.

But if narrativist approaches will not be eliminated they may nevertheless lose large swathes of their habitat as the invasive species of Big Data historians emerges. Social history should be fundamentally transformed; so too should mass-level political history, or what’s left of it, since the availability of public opinion data, convincing theories of voter choice, and cheap analysis means that investigating the courses of campaigns using documents alone is pretty much professional malpractice.

The dilemma for historians is no different from the challenge that qualitative researchers in other fields have faced for some time. The first symptom, I predict, will be the retronym-ing of “qualitative” historians, in much the same way that the emergence of mobile phones created the retroynm “landline.” The next symptom will be that academic conferences will in fact be dominated by the pedantic jerks who only want to talk about the benefits of different approaches to handling heteroscedasticity. But the wrong reaction to these and other pains would be kneejerk refusal to consider the benefits of quantitative methods.