This is a guest post by both Nexon and Patrick Thaddeus Jackson. Standard disclaimers apply.
Cullen Hendrix’s guest post is a must read for anyone interested in citation metrics and international-relations scholarship. Among other things, Hendrix provides critical benchmark data for those interested in assessing performance using Google Scholar.
We love the post, but we worry about an issue that it raises. Hendrix opens with a powerful analogy: sabermetrics is to baseball as the h-index is to academia. We can build better international-relations departments. With science!
The main argument in favor of the h-index is that it predicts academic productivity. What’s the h-index?
A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np − h) papers have no more than h citations each.
Google scholar includes books and anything else it crawls. So if someone has an h-index of 18, that means that they’ve ‘published’ 18 pieces with at least 18 citations.
What does the h-index do, exactly? It supposedly predicts both Np (the number of papers that a scholar is likely to publish over my career) and Nc (the number of citations a scholar will likely accrue). Or, at least, it does so better than alternative measures, such as a total count of publications, the mean number of citations, or various other ways of manipulating publication and citation data.
Thus, armed with h-indexes and other predictive measures, the field can objectively rank individuals and departments. Meanwhile, departments themselves can “field” better international-relations and political-science “teams.”
But how, exactly? Despite Dan’s best efforts, we have yet to convince either the American Political Science Association (APSA) or the International Studies Association (ISA) to transform themselves into Pokémon-style tournaments in which departments battle it out for ultimate dominance based on Google Scholar statistics, type, and abilities of their members.
And herein lies the basic difference between, on the one hand, academia, and, on the other hand, competitive sports and other games. The constitutive rules of the latter serve to sort winners from losers (note exception that proves the rule). In academia, the terminology of sports and games operates, at best, as metaphor.
Let’s return to the analogy with sabermetrics. Baseball, after all, is actually a game. It has pretty clearly defined rules and a formal organization tasked with upholding those rules. The net result: when one steps into a ballpark or puts on a pair of cleats and picks up a bat and ball, there exists little ambiguity about the point of the exercise. Playing baseball means to try to score more points than one’s opponents. This leads to winning the game. Such certainty about constitutive rules makes it possible to ask questions about which activities make the most contribution to that end. The rules define acceptable contributions, such as hitting the ball in such a way that fielders cannot catch it. We can both quantify those rules and also relate to the ultimate outcome: scoring sufficient runs to win the game.
Sabermetrics, after all, amount to an intervention into a game that already enjoyed existing measures of player performance and productivity. It sought to replace those statistics, like “runs batted in” and “hits with runners in scoring position,” with measures like “on base percentage + slugging percentage.” Sabermetricians argued that their new measures provided more accurate ways of assessing a player’s contribution to the goal of the game.
Eventually this leads us to notions like “win share” as a comprehensive alternative to traditional measure of player performance. But the critical point here is that while measurements changed, the overall goal did not; the rules of the game remained the same; the point of playing the game—to score more runs than the other team, and thus win the game —continued unchanged.
Back to the h-index. Even if a scholar’s h-index reliably predicts publications and citations over a career, it remains unclear that this provides a reliable indicator of that scholar’s contribution to the overall point of the academic exercise. Does the scholar who amasses the most citations over a career “win” academia (“she who dies with the most citations wins”)? Does a department composed of scholars with high h-index numbers “win” higher education? What does “winning” even mean?
This creates two problems: one conceptual and one practical.
The conceptual problem, as the discussion Hendrix kicks off already demonstrates, is that citations remain, at best, a proxy for impact. So even if one argued that a major part of the academic vocation involved generating knowledge, it is unclear that citations provide a good way to measure that. Samuel Huntington’s The Clash of Civilizations gets cited a lot. But how many of us would regard that as an enduring contribution to knowledge? Ken Waltz’s Theory of International Politics gets cited a lot, but how many of those citations are by people highly critical of Waltz’s whole approach?
One might argue that this renders both works “impactful.” After all, if everyone feels it necessary to engage with them, then they matter, right? Or one might argue that this makes both books contribute highly to ‘knowledge production’ in their role as foils. But these both suggest a procedural view of ‘knowledge production’: knowledge production resides in the process of being cited, not in whether the work itself makes an independent contribution to knowledge. And it raises difficult counterfactual questions: would the effort spent on debating The Clash of Civilizations have been better spent arguing about other things? Would scholarship have looked better, or worse, if we hadn’t been debating Theory of International Politics?
We might disagree on these questions, but such debate pushes us back into the realm of quote-unquote subjective estimations. They highlight that whatever citation counts measure, it is perhaps only tangentially related to knowledge-production. And being cited constitutes only one part of an academic vocation in any case; what about the “impact” a teacher exercises by helping her students learn how to think critically about international affairs, even if those students don’t cite her published work because they don’t go into academia? It is not obvious who ‘wins’ in that case, the highly cited scholar or the inspiring teacher.
This leads to the practical issue, which we might call the performativity problem inherent to these kinds of efforts to systematically assess impact and productivity. Once we establish a certain statistic as a metric for productivity, we change the game and give the players—which means: us and our colleagues—a new set of goals to aim for and rules to adhere to. The effort to measure scholarly productivity thus becomes a self-fulfilling prophecy, as we literally write and talk a revised set of scholarly norms into existence.
Unlike in baseball, where a new statistic once proposed has to be assessed for its actual contribution to the commonly-understood point of the game, productivity metrics in academia constitute novel goals and shape subsequent scholarly efforts in accord with them. Witness academic journals striving to increase their impact factors, or UK departments striving to do better in the REF/RAE process , or departments engaging in efforts to increase their ranking in the TRIP survey. This is nothing new, of course. An entire industry dedicates itself to helping students improve their scores on various standardized tests. Consultants and evaluators take up the same role with respect to academic institutions and departments. The means becomes an end in itself, and the measurement becomes a goal.
Now, this problem is less pronounced in professional baseball, precisely because players’ efforts to increase his OPS actually do contribute to their team’s proclivity to score runs. Do our efforts to publish pieces more likely to be cited actually contribute to knowledge, or to the education of our students and the broader public? Unless we are absolutely sure that they do, we should be cautious in our claims about any metric for scholarly productivity.
Indeed, consider a well-rehearsed process that obtains when creating metrics for activities that involve difficult-to-measure and difficult-to-define outputs: given a choice between “intangibles” and “numbers,” people almost always choose numbers because they create the veneer of objectivity—of unbiased—measures. Of course, what undergirds those numbers often entails a host of less than “unbiased” forces. When it comes to citations and publication, such forces include so-called citation cartels and other aspects of disciplinary politics. Such factors are inevitable, of course, but metrics such as the h-index don’t really substitute for them so much as risk hiding their operation.
Again, this would prove less of a concern in the absence of performativity dynamics. But these dynamics mean that the more that rely on them, the more they’ll function to allocate academic status and prestige in ways that render them real. If we use productivity measures to allocate funding, for example, then more ‘productive’ scholars and departments will enjoy more resources, and thus prove better able, all things being equal, to succeed at those measures.
Before we know it, we’ve transformed academia into something that looks a lot like baseball.
But academia isn’t baseball. And it shouldn’t be.