Tag: crowdsourcing

Crowdsourcing Data Coding

I just finished watching a video of CrowdFlower’s presentation at the TechCrunch50 conference. CrowdFlower is a plaform that allows firms to crowdsource various tasks, such as populating a spreadsheet with email addresses or selecting stills from thousands of videos that have particular qualities. The examples in the video include very labor intensive tasks, but tasks that a firm is not likely to either need again or feels is worth dedicating staff to.

As I was watching the video I thought about the potential to leverage such a platform for large-scale coding of qualitative data. In the social sciences, often we find the need in large scale research for the massive coding of data, whether it is language from a speech, the tenor or sentiment of quotations (or newspaper articles in media studies), the nature of cases (i.e. did country A make a threat to country B, did country B back down as a result, etc.), or the responses from an open-ended survey. Coding is an issue whether you conducting qualitative or quantitative analysis–especially where you have captured large amounts of data. Often times the data is not inherently numerical and needs to be translated so that quantitative analysis can be conducted. Likewise, with a qualitative approach one still needs to categorize various data points to allow for meaningful comparisons.

The interesting thing about a service like Crowdflower is that it can leverage a ready group of workers globally who are ready and willing to conduct the coding at a reasonable price. Additionally, Crowdflower utilizes various real-time methods to ensure the quality of the coding. Partially this is achieved through the scoring of coders relative to their past performance, how they fair on tasks that are “planted” by Crowdflower (i.e. salting with tasks where the correct answer is known ahead of time), and how much agreement there is between coders on various items.

The final method comes up quite a bit in social science research when you have to determine how to categorize a given piece of data. The level of agreement is crucial to confidently coding a particular case. I would imagine that a platform such as CrowdFlower could make that task easier and more robust by quickly tapping into a larger pool of coders.

Has anyone used a service like CrowdFlower in this way (i.e. coding data from qualitative research)? Would be interested in your perspective.

[Cross-posted at bill | petti]


Better Political Forecasts through Crowdsourcing

Dan Drezner links to a recent article by Philip Tetlock on the difficult business of political forecasting. His evaluation of this troubled pastime is accomplished through the review of three recent books that all claim to provide a better way to see the future of politics. His own research (Expert Political Judgment: How Good Is It? How Can We Know?, a fantastic book that you really should read) offers solid reasons to be skeptical of any pronouncements by ‘experts’ that they have some kind of proprietary knowledge about the future.

While I think his critique of the three books and of political forecasting in general is quite good, I find lacking one of his suggestions for how to improve the practice; namely, crowdsourcing. My issues does not lie with the practice of crowdsourcing, but rather the way that Tetlock describes it.

After his review of the three books (and the requisite approaches to forecasting each represents), Tetlock provides a powerful suggestion for how to improve the prediction business–crowdsourcing political forecasts:

Aggregation helps. As financial journalist James Surowiecki stressed in his insightful book The Wisdom of Crowds, if you average the predictions of many pundits, that average will typically outperform the individual predictions of the pundits from whom the averages were derived. This might sound magical, but averaging works when two fairly easily satisfied conditions are met: (1) the experts are mostly wrong, but they are wrong in different ways that tend to cancel out when you average; (2) the experts are right about some things, but they are right in partly overlapping ways that are amplified by averaging. Averaging improves the signal-to-noise ratio in a very noisy world. If you doubt this, try this demonstration. Ask several dozen of your coworkers to estimate the value of a large jar of coins. When my classes do this exercise, the average guess is closer to the truth than 80 or 90 percent of the individual guesses. From this perspective, if you want to improve your odds, you are better-off betting not on George Friedman but rather on a basket of averaged-out predictions from a broad ideological portfolio of George Friedman–style pundits. Diversification helps.

As Dan points out in his post, this suggestion potentially violates two of the necessary conditions of successful outsourcing, and that is the independence of the experts and diversity of their opinion. Dan says it best:

One of the accusations levied against the foreign policy community is that because they only talk to and read each other, they all generate the same blinkered analysis. I’m not sure that’s true, but it would be worth conducting this experiment to see whether a Village of Pundits does a better job than a single pundit.

I would actually go farther than Dan here. The problem with approach isn’t simply that political scientists and pundits may conduct their analysis in an echo chamber (although that is definitely an issue), but rather that for the crowdsourcing of these issues to work properly you would want as diverse a crowd as possible–meaning, you would wan to include individuals from outside of political science and the political pundit community.

Outside of an effective aggregation mechanism, James Surowiecki points to three necessary conditions for successful crowdsourcing:

  1. Diversity of opinion
  2. Independence of those opinions
  3. Decentralization (i.e. ability to lean on local knowledge)

Political Scientists and pundits do not hold a monopoly on useful insights into the world of politics. Other actors have an interest in understanding and predicting what will happen politically, including financial analysts, corporations, journalists, and politicians and citizens around the globe. Each of these groups likely brings their own perspective and lens for analyzing political outcomes to the table, and from a crowdsourcing perspective that is precisely what one would want (diversity, independence, and decentralization). The answer isn’t simply to gather more opinion from political pundits, but rather to gather more opinion from additional actors who represent an even greater diversity of opinion.

I agree with Dan that it would be worthwhile to set up some kind of experiment to determine the optimal composition of a political forecasting crowd. I smell a side project a brewin’….

[Cross-posted at bill | petti]


© 2021 Duck of Minerva

Theme by Anders NorenUp ↑