Editors, we need to talk about robustness checks

3 April 2019, 0836 EDT

It’s happened to all of us (or least those of us who do quantitative work). You get back a manuscript from a journal and it’s an R&R. Your excitement quickly fades when you start reading the comments. One reviewer gives a grocery list of additional tests they’d like to see: alternate control variables, different estimators, excluded observations. Another complains about the long list of robustness checks already in the manuscript, as it obscures the important findings. Sometimes both of these reviewers are the same person.

And it gets even more complicated if the article ends up rejected and you send it to another journal. Now that list of robustness checks–some of which were of questionable value–expands under a new set of reviewers’ comments. And those reviewers irritated by voluminous appendices get even more annoyed by all the tests included with little clear justification (“another reviewer told me to add this” not being an acceptable footnote).

This common situation is fodder for (kind of) humorous stories, and limitless frustration for scholars, especially junior ones dependent on getting another article past cranky reviewers. But it can also have serious impacts on the quality of our scholarship.

The problem is that both reviewers are right: robustness checks are important and distracting.

It is rare for one model to be a completely sufficient test of a hypothesis. There are often various controls that could be included, various estimators that could be used, and various Stata options to add to the dofile. In fact, if a paper I’m reviewing includes just one or two models with no discussion of alternate specifications, I get suspicious that they’re hiding something.

It may be possible to have a simple, crisp, clearly identified research design that requires no robustness checks. But this would not be the case for much international relations research that relies on admittedly messy data. One solution is to avoid relying on such data, but that limits the questions we can ask. Instead, we admit the limits of any models we run by…including lots of robustness checks.

At the same time, this is annoying. Readers’ eyes glaze over as they scan the list of alternate specifications. And it takes up precious space in an article that could be devoted to more discussion of the results or the broader implications of a study. One option may be to move all robustness checks to appendices. These are usually included as separate files in the review process, and can then be made available online after publication. Of course, we all know that a significant number of reviewers fail to notice the appendix, so the more an author relies on the appendix to deal with potential counterarguments, the greater the risk of rejection.

To be honest, I’m not really sure what the answer to this is. But for the sake of frustrated authors and reviewers alike, we need some sort of standard.

I’d suggest that editors take the lead. In authors’ instructions, usually available on the journal website, the editor can specify their standards for robustness checks: do you want them discussed in-text, referenced in an in-text appendix, or left in a separate reviewers’ appendix? And some journals provide instructions to reviewers (which, really, all should do). This could easily include discussion of robustness checks: the journal has decided to limit their use, the journal asks authors to keep them in an appendix (so be sure to read the appendix), the journal has decided to prioritize robustness checks over qualitative discussion of the findings, etc.

This won’t prevent all issues with peer review, but it will reduce the extent to which our futures are dependent on the match between our style of writing and a reviewer’s preferences.