Symposium — In Defense of Simplistic Hypothesis Testing

7 September 2013, 1000 EDT

EJT_19_3_cover.inddEditor’s Note: This is a guest post by  Dan ReiterIt is the fourth installment in our “End of IR Theory” companion symposium for the special issue of the European Journal of International Relations. SAGE has temporarily ungated all of the articles in that issue. This post responds to John J. Mearsheimer’s and Stephen M. Walt’s article (PDF). Their post appeared earlier today.  

Other entries in the symposium–when available–may be reached via the “EJIR Special Issue Symposium” tag.

Thanks to John Mearsheimer and Stephen Walt for writing such an important and provocative article. I agree with many of their central assumptions, especially the importance of building rigorous theories, and of executing appropriate and sound empirical tests of theories. I also agree that at this juncture we need more theoretical development, especially (in my view) in emerging areas such as neuroscience and conflict, gender and conflict, and networks.

Here, I lay out a few of my many reactions to their article.

Though I agree with Mearsheimer and Walt that empirical work is most powerful when it is well-executed and well-grounded in theory, I fear there is a potentially dangerous inference from their observation that some empirical work, what they call simplistic hypothesis testing, suffers from flaws. Specifically, we should avoid the inference that the existence of flawed empirical work should push us away from empirical work. Given their bedrock assumption that science requires empirical testing as well as theory building, if one observes flawed empirical tests, the appropriate reaction should be not to do less empirical testing, but rather to do better empirical testing. If data are flawed, fix the flaws. If data measure some theoretical concept poorly, collect better data or improve the measure.  If a model is specified poorly, improve the specification.

The intrinsic value of empirical testing aside, Mearsheimer and Walt underestimate the two major contributions hypothesis testing, even simple hypothesis-testing, makes toward theory innovation and development.

First, empirical work, even atheoretical empirical work, sometimes pushes theory forward by making controversial claims. The democratic peace literature is a good example of this dynamic. Essentially the first scholarly article on the democratic peace was a 1976 Jerusalem Journal of International Relations article by Melvin Small and David Singer (PDF).  They noted (atheoretically) that democracies fight wars, but not against each other. That empirical observation led to a burst of important theoretical work fleshing out a positivist, liberal theory of international relations. Key theoretical works in this area included Michael Doyle’s early 1980s articles (e.g., PDF), formal work connecting domestic political institutions with conflict behavior (such as Bruce Bueno de Mesquita and David Lalman’s War and Reason), and Bruce Russett’s landmark works Grasping the Democratic Peace and Triangulating Peace, to name a few.

We may see a similar dynamic unfold regarding scholarship on gender and conflict. Articles in the 2000s by scholars such as Mary Caprioli demonstrated significant correlations between gender and conflict behavior. The 2012 book Sex and World Peace by Caprioli and her coauthors demonstrates these empirical relationships even more persuasively.  In years to come these increasingly undeniable findings will push scholars to better develop existing theory on gender and conflict, for example, better sorting out cultural and biological connections between gender and political behavior.

Second, empirical work helps build existing theory. Many empirical papers draw out and test new hypotheses from existing theories. This develops existing theory by pushing it in previously unanticipated directions, or by forcing the modification of theory in the face of disconfirming empirical findings. The relationship between theory and testing is synergistic rather than sequential, as theory suggests empirical tests, empirics point to theoretical modifications, those new modifications are tested, and so on.

One Mearsheimer/Walt argument is that because much empirical work is flawed, it makes few contributions. By flawed, they mean that many empirical tests contain at least one element that needs improvement, such as not testing for alternative explanations or using measures that match a theoretical concept poorly. This is the wrong way to think about the role “flawed” (or, less pejoratively, “imperfect”) work plays in advancing knowledge. An ambitious research agenda will have many pieces that need close evaluation (data, measurement, research design, theoretical specification, statistical models, etc.), and the evaluation of each piece may itself require several papers.  Each paper often tackles only one piece at a time, really zeroing in narrowly and deeply to get one element right, or at least significantly improve one element.

Mearsheimer and Walt might view such imperfect work as carrying little scholarly value.  However, this is the kind of work that most people produce, in part because of strict page limits for journal articles. More importantly, this kind of work does advance knowledge. A community of scholars uses a body of imperfect work collectively to advance on all fronts, each paper working on one part of a complex problem. Scholars read each other’s work, employ each other’s advances, and collectively progress.  Pointing at flaws at individual papers and inferring that little progress is being made misses the big picture of overall progress. Mearsheimer and Walt might be doubtful that the entire enterprise has produced much, but I am more optimistic. The interested reader might consult Russell Leng’s 1999 Conflict Management and Peace Science article.  This insightful, older paper offers encouragement, favorably comparing knowledge accumulation in quantitative international relations with knowledge accumulation in the study of heart disease.

The democratic peace demonstrates this dynamic of papers collectively advancing knowledge by each tackling one or so problem at a time. Democratic peace papers over the last thirty years or so have challenged old data, like Polity, and developed new data. They identified methodological flaws and developed solutions, like the 1998 Neal Beck et al paper on binary, cross-sectional, time series data (PDF). They assessed alternate explanations for the observed democratic peace. They explored implications of democratic peace theory, such as that democracies are more likely to employ mediation, win the wars they start, fight shorter, cheaper wars, and so on. In recent years, survey experimental work has drilled deeper into the assumptions of the democratic peace, developing better understanding of the effects of casualties, combat success, financial costs, and stakes on voter support for war. Yes, seen in isolation any one of these papers has flaws or shortcomings, but each paper has something constructive to offer, and those constructive elements in combination advanced the entire agenda.

Theoretical work progresses in a similar way.  Scholars published flawed or limited work, and then other scholars explore these flaws and limitations in successive scholarship.  Kenneth Waltz’s landmark Theory of International Politics was tremendously important, though flawed and incomplete.  Follow on work like Walt’s The Origins of Alliances and Mearsheimer’s Tragedy of Great Power Politics critiqued, repaired, and extended Waltz’s work, developing richer realist theory.

Mearsheimer and Walt are troubled by the publication of work that doesn’t accumulate neatly, such as Jason Lyall’s collection of articles on insurgent violence.  I think the drift of their critique would be that faced with an array of apparently inconsistent empirical patterns, Lyall should have held off on publishing any of these papers until he built a theoretical framework that explained the body of findings.

I would disagree with such a suggestion. Each of Lyall’s articles presented new data in an empirically underdeveloped area, employing sophisticated research designs.  Our understanding of insurgent violence advances faster if the articles get published, in their inconsistent glory, and we are all provoked by the diverging results to craft better theory to account for them. Put differently, we are going to figure out faster the puzzle of why the results diverge if we all can think about it (that is, if the work is published singly before Lyall reaches a theoretical solution), than if just Lyall thinks about it (that is, if he holds off publication until he arrives at a grand solution).

I also disagree with the proposition that a focus on simple hypothesis testing pushes scholars away from addressing real world problems. To the contrary, much of the quantitative empirical work in recent years has provided insights into very real policy questions, such as: Does economic aid sap support for insurgents? Do economic sanctions work?  Do air strikes on nuclear programs work? Does international peacekeeping work? Does peaceful nuclear assistance increase the risks of nuclear proliferation?  Do human rights treaties improve states’ human rights records? Will stability and democracy follow foreign imposed regime change? The difference between contemporary research and research conducted 25 years ago is not that work today is less policy-focused, but rather that contemporary research focuses on different policy problems, specifically a lesser focus on great power war.

One last point.  I am less troubled than Mearsheimer and Walt by the eagerness of graduate students to publish research, even simple hypothesis-testing. Conducting advanced political science research is a craft, one with an ever-rising level of sophistication. Part of becoming a scholar is learning by doing, and the best way for a graduate student to undergo that process is to try to publish a piece of research, thinking through a theoretical framework, crafting hypotheses, analyzing data, framing results, making the format suitable for submission, picking the right kind of journal, and dealing with reviews. Whether the student succeeds or fails in that effort, he or she will learn from the process, and perform better the next time around. That is the essence of the scientific process: the continuing correction of flaws, both in our imperfect work and in our imperfect selves.