Tag: data

Editors, we need to talk about robustness checks

It’s happened to all of us (or least those of us who do quantitative work). You get back a manuscript from a journal and it’s an R&R. Your excitement quickly fades when you start reading the comments. One reviewer gives a grocery list of additional tests they’d like to see: alternate control variables, different estimators, excluded observations. Another complains about the long list of robustness checks already in the manuscript, as it obscures the important findings. Sometimes both of these reviewers are the same person.

And it gets even more complicated if the article ends up rejected and you send it to another journal. Now that list of robustness checks–some of which were of questionable value–expands under a new set of reviewers’ comments. And those reviewers irritated by voluminous appendices get even more annoyed by all the tests included with little clear justification (“another reviewer told me to add this” not being an acceptable footnote).

Continue reading

(Insert Research Method Here) Doesn’t Smell Like Roses

There’s an interesting debate going on over at openGlobalRights.  Drawing on their recent Social Problems article, Neve Gordon and Nitza Berkovitch provocatively accuse human rights quantitative scholars of “concealing social wrongs” by using quantitative cross-national data that does not account for the disproportionately high voter disenfranchisement among African Americans.  Todd Landman and Chad Clay, two scholars known for their use/production of quantitative human rights data respond to Gordon and Berkovitch, saying that their piece ignores much quantitative human rights scholarship that is not at the cross-national level, fails to understand the coding decisions and methodology behind cross-national human rights data, and misses what we’ve learned from existing studies.  It’s a great discussion and one I’m going to make sure my human rights students all read.

I’m going to take a slightly different approach here in responding to Gordon and Berkovitch, two scholars, I should note, that I have learned a lot from.  I think this particular piece, however, is completely disingenuous: there is nothing special about qualitative analysis that necessarily implies that a researcher will observe/record/code group differences in the protection of human rights within a country.

Continue reading


Tweets of the Week #2


Welcome to the second edition of “Tweets of the Week.” It was a busy seven days for news and my twitter feed provided much useful information — in micro-form.

The Scottish independence referendum featured especially prominently in my feed. This was perhaps my favorite tweet about the final result:

Prior to the vote, my feed was filled with some great tweets about the #indyref. Here are a few of the shorter ones that I found especially helpful:


The Scottish referendum, of course, was not the only interesting issue in global politics this week. And, over the long haul, it almost assuredly wasn’t the most important either.

For example, the continuing spread of Ebola might be the biggest near-term threat to international security — depending upon how we define “security.”

No matter how depressed you might be about the prospect of new war in the Middle East, this tweet helps provide context:

But read this too, on ISIS/ISIL:

It also seems appropriate to be worried about Ukraine:

Finally, here’s a blast from the past that might be quite helpful in a class that is discussing renewed war in Iraq:


Data Dilemmas & Converging Logics

One of the recurring subjects among folks using data is: why does person x not share their data with me?  Mostly because they are fearful and ignorant.  Fearful?  That their work will get scooped and/or their data might be found to be problematic.  Ignorant?  That they don’t know that they are obligated to share their data once they publish off of it and that it is in their interest to share their data.  There is apparently a belief out there that data should be shared only after the big project is published, not after the initial work has been published.  I will address this as well as the the converging logics of appropriateness and consequences here.

Continue reading


New Subnational African Education and Infrastructure Dataset

Todd Smith, Anustubh Agnihotri, and I have put together a new resource of subnational education and infrastructure access indicators for Africa, released as part of the Climate Change and Africa Political Stability (CCAPS) program at the University of Texas. This dataset provides data on literacy rates, primary and secondary school attendance rates, access to improved water and sanitation, household access to electricity, and household ownership of radio and television. The new CCAPS dataset includes data for 38 countries, covering 471 of Africa’s 699 first-level administrative districts.  Continue reading


War Law, the “Public Conscience” and Autonomous Weapons

In the Guardian this morning, Christof Heyns very neatly articulates  some of the legal arguments with allowing machines the ability to target human beings autonomously – whether they can distinguish civilians and combatants, make qualitative judgments, be held responsible for war crimes. But after going through this back and forth, Heyns then appears to reframe the debate entirely away from the law and into the realm of morality:

The overriding question of principle, however, is whether machines should be permitted to decide whether human beings live or die.

But this “question of principle” is actually a legal argument itself, as Human Rights Watch pointed out last November in its report Losing Humanity (p. 34): that the entire idea of out-sourcing killing decisions to machine is morally offensive, frightening, even repulsive, to many people, regardless of utilitarian arguments to the contrary: Continue reading


Learning to Fish Through Human Rights Data

I often encourage my students to distill complex analytical concepts into terse, plain English.

But some things can’t be boiled down to a tweet, as I discovered this week when attempting to explain Cingranelli-Richards data coding in response to Joshua Foust’s queries on my abusers’ peace post.

What I didn’t think to tell him in response to his original question was: here is how you can look it up for yourself.

So this post contains (I hope) a better answer to Josh’s question but also a brief primer on the CIRI dataset, what it contains and how to use it.

I should add that I’ve never used it for research myself, that I don’t work with large data-sets, and that I’m not claiming to think the coding is perfect. But if you ever need to look up the answer to a question like: “what are the states that are similar to Singapore in both regime type and human rights record?” CIRI provides a user-friendly resource for a little fact-checking.

Here’s what the data-set contains: CIRI consists of quantitative scores for government respect for 15 internationally recognized human rights for 195 countries, annually from 1981-2010. The rights coded include: physical integrity rights (like no torture, disappearance or summary execution), empowerment rights (free speech, free assembly, freedom of religion, and the right to vote) and indicators for women’s and workers’ rights (here are the descriptions). Scores on each for each country-year were derived by coders drawing on Amnesty and State Department country reports for that year (codebook here).

CIRI is primarily designed to be used by students of human rights for large-N regression analysis, but here’s how journalists, bloggers, or can use CIRI to answer basic questions about countries’ human rights records at a glance :

1) Create a CIRI account.

2) Go to Download Data. Click Create New Dataset

3) Select just the variables and years you want.

4) Compare them in an excel sheet.

5) Sort the Excel sheet columns according to the question you’re asking.

Here’s how I used it to answer Joshua’s question on countries like Singapore (and a better explanation of my answer): I created a personal spreadsheet for just the year 2010. As a measure of “human rights performance” I looked at just the physical integrity index already created by CIRI (which combines scores on torture, extrajudicial killing, political imprisonment and disappearances). The index ranges from 0 (no government respect for these four rights) to 8 (full government respect for these four rights).

For “freedom” I created my own index by also downloading the columns for freedom of association, freedom of speech, electoral rights, and independence of the judiciary. These are coded from 0-2, with 0 being the worst score.* So my CIRI dataset included the CIRI variables labeled PHYSINT, INJUD, ELECSD, ASSN, and SPEECH. I used Excel to create a column averaging these last four columns numbers for each country, and then compared my country average score on “freedom” to the CIRI country score on “political integrity rights.”

How did I do this at a glance without statistical analysis fast enough to respond to a tweet? Easy. I just sorted the PHYSINT column by largest number first, so the best human rights performers are at the top, and the worst are at the bottom. Among those countries who receive an “8” it’s very easy to tell who are the free-est and least free – they vary in rank order on the other column between .25 and 2 (they can go as low as 0 but don’t for these high human rights performers). If you scroll down into the 7s (which is where Singapore sits) you can see the same distribution.

Now, Joshua’s question was which countries were similar to Singapore – relatively free, but relatively poor human rights record – and I listed the countries that score as well or better on human rights (7 or 8) but as bad or worse on freedom (.75, .5, .25, or 0). But his real question is to what extent they are outliers among these high-performers. (The comparison should be the the other high performing countries, not to all 192 countries in the dataset.) And to some extent Josh’s hunch is correct: 67 countries receive the 7 and 8 rankings for human rights, and only 6 score at .75 or worse on freedom (Singapore, Djibouti, Qatar, Bahrain, Seychelles, and Oman). However that doesn’t mean that all the other high performers are at the upper ranking for freedom – lots are in the middle ground. Only 29 out of these 67 have the highest democracy score. So on the one hand if you have the highest human rights score you have a 43% chance of being a full-fledged democracy, and only an 8% of being an autocracy. But you also have a 52% chance of falling somewhere in the middle on the democracy scale.

Within the middle and lower grades on human rights performance, especially the middle performers receiving a grade of 4, 5 or 6, there is wide variation in the relative freedom score. So while even eyeballing this data you can see a relationship between rights and democracy, the correlation is certainly imperfect. States like Singapore and Qatar that score super high on one indicator and super low on the other are indeed outliers, as are states like South Africa with mid-high freedom scores but low human rights performance. What these cases show, though, is that we have to qualify our conflation of human rights and democracy and think more about how this relationship works.

But the key point here is: this data is at anyone’s fingertips who wants to look at it independently or play with it for their own purposes.

*This is NOT the measure of democracy used in the studies I wrote about, both of which use a different dataset, Polity IV, to measure democracy. I used the CIRI measures as a short-cut because a user can easily compare them to one another.


Data, data on the wall

OK, OK, OK, I know life is short and some of us need to get a life (I’m not in Seattle by the way), but this is a really cool app from Uppsala:

From the iTunes description: “Data on 300 armed conflicts, more than 200 summaries of peace agreements, data on casualties etc, without having access to the Internet.”

I actually do think this is really cool and I can see real benefits to having this type of data at one’s fingertips, but I do wonder how these dramatic changes in the ease of access to select types of data and data summaries (even from reputable places like Uppsala) will alter research strategies, research teaching methods, the research capabilities of future scholars, and ultimately research output.

With Kate’s excellent post directly below, I wonder is this a good thing or not so good? Thoughts?

Oh, one other thing, any interest in an app for Duck?


Crowdsourcing Data Coding

I just finished watching a video of CrowdFlower’s presentation at the TechCrunch50 conference. CrowdFlower is a plaform that allows firms to crowdsource various tasks, such as populating a spreadsheet with email addresses or selecting stills from thousands of videos that have particular qualities. The examples in the video include very labor intensive tasks, but tasks that a firm is not likely to either need again or feels is worth dedicating staff to.

As I was watching the video I thought about the potential to leverage such a platform for large-scale coding of qualitative data. In the social sciences, often we find the need in large scale research for the massive coding of data, whether it is language from a speech, the tenor or sentiment of quotations (or newspaper articles in media studies), the nature of cases (i.e. did country A make a threat to country B, did country B back down as a result, etc.), or the responses from an open-ended survey. Coding is an issue whether you conducting qualitative or quantitative analysis–especially where you have captured large amounts of data. Often times the data is not inherently numerical and needs to be translated so that quantitative analysis can be conducted. Likewise, with a qualitative approach one still needs to categorize various data points to allow for meaningful comparisons.

The interesting thing about a service like Crowdflower is that it can leverage a ready group of workers globally who are ready and willing to conduct the coding at a reasonable price. Additionally, Crowdflower utilizes various real-time methods to ensure the quality of the coding. Partially this is achieved through the scoring of coders relative to their past performance, how they fair on tasks that are “planted” by Crowdflower (i.e. salting with tasks where the correct answer is known ahead of time), and how much agreement there is between coders on various items.

The final method comes up quite a bit in social science research when you have to determine how to categorize a given piece of data. The level of agreement is crucial to confidently coding a particular case. I would imagine that a platform such as CrowdFlower could make that task easier and more robust by quickly tapping into a larger pool of coders.

Has anyone used a service like CrowdFlower in this way (i.e. coding data from qualitative research)? Would be interested in your perspective.

[Cross-posted at bill | petti]


Research and Data on September 11 Terrorist Attacks

It is an appropriately gloomy day here in Manhattan, as the city and the country remembers the horror of September 11th, 2001 and attempts to continue to collectively heal. For me, part of that healing process has been trying to understand what happened, and more importantly, how to prevent it from ever happening again. Over the past eight years many others have been moved to investigate and analyze these events, which has lead to a plethora of research on 9/11—some good, some not so good.

As someone who attempts to read everything that comes across my desk related to these attacks, I thought today an appropriate time to compile a short list of my favorite research and data on the terrorist attacks of September 11th.


  • Leaderless Jihad: Terror Networks in the Twenty-First Century, Marc Sageman – Much of the initial academic and popular research related to the causes of terrorism in the aftermath of 9/11 focused on the colloquial wisdom that terrorist were poor, uneducated, and disaffected young men. Sageman was the first scholar to actually apply scientific rigor to the analysis of terrorist origins, and using his own data on the Hamburg cell the book continues to stand out as on of the best treatments of the formation and motivation of the 9/11 hijackers.
  • Responder Communication Networks in the World Trade Center Disaster: Implications for Modeling of Communication Within Emergency Settings.Journal of Mathematical Sociology, 31(2), 121-147. Carter T. Butts; Miruna Petrescu-Prahova; B. Remy Cross – This is one of the most unique and interesting studies on the events of 9/11. Butts and his co-authors use data from emergency responder radio communication to build a dynamic collaboration network. This is a great paper for those interested in time-space relations under heavy stress and uncertainty.
  • The Internet Under Crisis Conditions, Learning from September 11, National Academic Press – I was fortunate enough to have attended the release conference this research in Washington, DC. This remains the most comprehensive examination of global internet traffic, and network response in the aftermath of the loss of a major node at the World Trade Center.
  • An economic perspective on transnational terrorism, European Journal of Political Economy. Volume 20, Issue 2, June 2004, Pages 301-316, Todd Sandler and Walter Enders – Sandler and Enders are two leading scholars on the relationship among politics, economics and terrorism, and have written extensively on the topic. This article is one of the first to apply a game theoretic model to the economic of terrorism in the aftermath of 9/11.


As always, I welcome any and all addendum to the list.


© 2021 Duck of Minerva

Theme by Anders NorenUp ↑