The Duck of Minerva

The Duck Quacks at Twilight

Crowdsourcing Data Coding

September 16, 2009

I just finished watching a video of CrowdFlower’s presentation at the TechCrunch50 conference. CrowdFlower is a plaform that allows firms to crowdsource various tasks, such as populating a spreadsheet with email addresses or selecting stills from thousands of videos that have particular qualities. The examples in the video include very labor intensive tasks, but tasks that a firm is not likely to either need again or feels is worth dedicating staff to.

As I was watching the video I thought about the potential to leverage such a platform for large-scale coding of qualitative data. In the social sciences, often we find the need in large scale research for the massive coding of data, whether it is language from a speech, the tenor or sentiment of quotations (or newspaper articles in media studies), the nature of cases (i.e. did country A make a threat to country B, did country B back down as a result, etc.), or the responses from an open-ended survey. Coding is an issue whether you conducting qualitative or quantitative analysis–especially where you have captured large amounts of data. Often times the data is not inherently numerical and needs to be translated so that quantitative analysis can be conducted. Likewise, with a qualitative approach one still needs to categorize various data points to allow for meaningful comparisons.

The interesting thing about a service like Crowdflower is that it can leverage a ready group of workers globally who are ready and willing to conduct the coding at a reasonable price. Additionally, Crowdflower utilizes various real-time methods to ensure the quality of the coding. Partially this is achieved through the scoring of coders relative to their past performance, how they fair on tasks that are “planted” by Crowdflower (i.e. salting with tasks where the correct answer is known ahead of time), and how much agreement there is between coders on various items.

The final method comes up quite a bit in social science research when you have to determine how to categorize a given piece of data. The level of agreement is crucial to confidently coding a particular case. I would imagine that a platform such as CrowdFlower could make that task easier and more robust by quickly tapping into a larger pool of coders.

Has anyone used a service like CrowdFlower in this way (i.e. coding data from qualitative research)? Would be interested in your perspective.

[Cross-posted at bill | petti]

+ posts

Petti is Associate Director of Insights and Analytics at Alexion . Previously, he served as Lead Data Scientist in the Decision Sciences group at Maritz Motivation and a Global Data Strategist and Subject Matter Expert for Gallup.