Seventy-three-point-six percent of all statistics are false. Or are they? We don’t have any idea. We do know that statistics – whether in politics or political science – provide powerful tools of persuasion. But even scholars can find it difficult to tell when statistical findings are accurate, misleading, or outright fabricated.
Since good statistics first and foremost depend on quality data, political scientists have spent decades calling for greater data transparency and access. In recent years, many prominent journals have developed and refined requirements in these areas; in 2012, the American Political Science Association published an ethics guide that included sections on the gathering and handling of data.
Nevertheless, quantitative political science still suffers from a credibility crisis; as we increasingly promote our research in the public arena, the need to “get the data right” becomes ever more acute.
It’s been nearly a decade since the last collective effort to asses the state of data collection, transparency, and accessibility in the field.
Scholars working in “conflict studies” face particularly tricky concerns when it comes to assuring the quality of the data that they use. They work with concepts, such as “civil war” and “peace,” that are difficult to define – and even harder to measure. They often deal with sensitive data that concerns issues of (literal) life and death.
It’s not surprising, then, that scholars publish the occasional special issue or guide offering advice about how to collect and handle conflict data. But it’s been – to the best of our knowledge – nearly a decade since the last collective effort to asses the state of data collection, transparency, and accessibility in the field.”
A lot has changed in those years. Scholars collect and use increasingly granular data in order to ask questions about “where” and “when” conflict takes places, about the “motivations, relative capacity and constraints” of “who” does the fighting, and about “how” they “pursue their objectives.” They increasingly rely on machine-learning approaches to produce datasets – especially as donors, administrators, and entrepreneurs embrace “big data” as a silver bullet for national and global problems.
The Mershon Center for International Security Studies at The Ohio State University brought together ten experts for a two-day workshop and a public webinar. Their task: to take a fresh look at standards for data accuracy and transparency in conflict studies.
The discussions were detailed and ranged widely. At the end of the day, the participants came up with five key insights and recommendations.
1. Increase Transparency
Researchers must document their data collection “pipelines” and, whenever possible, make that information publicly available. This documentation should include crucial information, such as how they trained and evaluated their research assistants – especially in the context of projects in which research assistants are responsible for coding at least some of the data.
It should also include explicit discussion of the limitations of the data. This will help future users of datasets to recognize and anticipate errors in the dataset. The entire community should work to create a research culture in which researchers can expect to be rewarded – and don’t need to fear negative consequences – for being open about weaknesses in the datasets that they produce.
Project managers should also improve transparency by better documenting their sources. Researchers should also code for ambiguities in their data, including biases in source material. These kinds of steps are particularly important in light of growing concerns about misinformation and disinformation – both of which can find their way into datasets.
2. Standardize Replication Practices
Along with other fields across the social sciences, conflict studies now generally embraces open data and publicly available code. The problem is that journals have wildly varying standards, as do the repositories that they use. The field should standardize replication practices and embrace robust version control systems such as GitHub.
This will make it easier for scholars to “reopen” datasets and to collaborate on updating existing datasets. In turn, it will be easier to keep datasets up to date – in terms of both format and substance – even without the involvement of their original creators. We should also take concrete steps to recognize that work as an important contribution to the field.
3. Improve Data Literacy and Usage
Some people misuse data. Conflict data is no exception. Curators should take additional steps to reduce the chances that people will unwittingly misuse data.
These might include creating publicly available video tutorials or hosting workshops – perhaps at meetings of professional associations – that walk potential users through the dataset and its components.
Whatever form these steps take, the field should treat them as a valuable contribution to knowledge – one recognized in tenure and promotion letters, hiring decisions, and through awards.
4. Consider, Albeit with Appropriate Caution, Partnering with Computer Scientists
Given the promises of machine learning and other artificial-intelligence approaches to data collection and curation, conflict scholars may find it advantageous to partner with computer scientists and methodological gurus. We need to keep in mind that productive working relationships take time and energy to build, and they do not always pay off.
Our experts pointed to a number of obstacles to successful cross-disciplinary collaboration, including differences in publication incentives, research time horizons, and ethical standards.
In particular, conflict scholars should have realistic expectations about what computer scientists can deliver. They might find more success in working with political methodologists or computational social scientists, who not only have advanced technical expertise but also are more likely to share similar interests and concerns.
5. Reckon with Ethical Issues
The production and use of conflict datasets doesn’t normally require IRB approval, as it usually doesn’t include what IRBs view as research with human subjects. But the construction of conflict datasets does often involve important ethical considerations.
Participants at the workshop also stressed that scholars need to pay more attention to research-related trauma
There are times when researchers can only get access to data if they guarantee the anonymity of their sources. Sometimes norms concerning data transparency would, if followed, jeopardize the safety of local researchers or informants. Some datasets might be useful to authoritarian governments interested in suppressing protests or rebel groups seeking to successfully intimidate noncombatants.
Participants at the workshop also stressed that scholars need to pay more attention to research-related trauma. For example, undergraduate students have experienced demonstrable psychological harm from reading, listening to, or watching graphic accounts of violence – and then coding that information as zeros and ones.
Such considerations highlight the importance of having ongoing, informed discussions about data ethics. Creating dedicated platforms for these discussions, as well as promulgating best practices for such conversations, would help improve the “ethical oversight” of data collection and curation.
Given ongoing advancements in technology and methodology, we expect that conflict scholars will face increasingly complex ethical and practical issues. We need to address these as a field. Otherwise conflict scholars will likely have to figure out a growing number of ad hoc and inconsistent standards.
We hope that the ideas generated at our workshop will spark further conversation, and thus contribute to collective efforts to set and reward best practices.