The collateral damage of performance metrics

12 August 2015, 1431 EDT

This is a guest post from Daniel Mügge who is an associate professor of political science at the University of Amsterdam and the lead editor of the Review of International Political Economy.

In two recent posts, Cullen Hendrix, and Daniel Nexon and Patrick Thaddeus Jackson, have tabled important pros and cons of Google Scholar (GS) as a base for measuring of academic performance. And the flurry of reactions to their blogs reveal just how central and touchy and important an issue this is.

The debate so far concentrates on gauging the “quality” of an individual scholar, and how different approaches are fraught with biases. But when we weigh the merits and demerits of something like GS, there is another level that’s so far ignored: the collateral damage that managerial tools of quality control do to the academic enterprise as a whole.

The problem is simple: to boost the productivity of planet academia, all its inhabitants are surrounded by an intricate net of incentives. The reactions are predictable (as economics has taught us): academics respond to these incentives. As intended, an expected pay-off supplants intrinsic motivation as the fuel for our work. Before you know it, every tweet we send, every op-ed we pen, every seminar we organize, is just a means to some higher end – normally advancing our “careers”, meaning higher-ranking posts at higher-ranking universities. The academic ideals with which we entered the ivory tower – science for the sake of science, or for the sake of society – get crushed on the way.

The point is not that we should not care about the quality of each other’s work, or of those whom we hire. And I too remain suspicious of biases in personal, subjective judgments, whether they concern tenure committees or article reviews. But algorithms as an alternative invite opportunistic behavior because they don’t just measure behavior but also steer it – the performativity Nexon and Jackson discuss. They not only move the goal posts, they also alter them fundamentally. Why toil, research, write and publish? Simple. To boost your h-index.

Management of academics through targets and standardized metrics to boost competition goes back at least two decades. Stage one of the publication craze forced young academics to inflate their publication lists. The result? Overpriced edited volumes proliferated, never mind that they would languish unread in library stacks. Scholars recycled data and ideas endlessly, chopping them into the smallest publishable units. They created new journals not because of higher demand for articles, but because of skyrocketing supply of manuscripts. At the margin, the obsession with quantity also encouraged opportunistic co-authorship, with members of publication teams taking turns to “lead-author” the next article.

Academics’ responses to those incentive structures were as individually rational as they were collectively insane. The backlash against publication inflation ushered in stage two. Enter the impact factor (IF) . Henceforth, the quality of the publication outlet should be decisive when gauging a paper’s worth. The pitfalls of IFs have been debated endlessly. Suffice it to reiterate those that relate directly to perverse incentives, mainly for journal editors. Once its IF determines the volume and quality of submissions a journal receives, both success and failure are self-reinforcing. The journal is either on the way up or on the way down.

Journal editors thus have incentives to rig the game: prioritizing mediocre scholarship by big names, stuffing new articles with cites to the journal, adapting publication schedules to boost the IF, etc. It is unclear how often editors succumb to these temptations, but it is undeniable that they are there. (Disclaimer: at the Review of International Political Economy, which I co-edit, we have consistently shunned such tricks.) Before you know it, academia perceives journals through the prism of a single figure: the IF – never mind that that is often driven by a handful of articles. These trends threaten to sap vital energy from journals. Want to climb the rankings? Stop publishing what you think others should read; instead, publish what you know they will cite.

In our present-day stage three, the focus shifts back to individual scholars. Articles have Altmetric scores slapped on them, measuring for example how much Twitter-buzz your latest piece had triggered. Once again, the incentive is clear: to score highly, you have to build a faithful social media following. To please the metric, you start tweeting, build a profile on ResearchGate, and promote yourself on sundry other websites. Let’s be clear. There is nothing wrong with tweeting. But it is regrettable if scholars do it because they feel they have to, not because they find it useful for their work.

Academic management through incentives turns us into opportunistic costs-benefit analysts. People in the ivory tower tend to be smart: they are acutely aware of which kind of activity generates career pay-offs. Soon, everything has a merit-metric attached to it. Organize a faculty seminar? Three impact points. (Five if an Ivy League professor attends.) Op-ed? Four. More than 1k Twitter followers? Two.

We are caught in a cycle with no end in sight: new carrots and sticks are dangled in front of us, new forms of opportunism appear, criticism ensues, and the incentives are refined again, ushering in a new round. Rules for tenure and grant applications change from one year to the next as quality standards are rejigged continuously. No wonder burn-outs among young academics are on the rise.

It is easy to see where this will end. When you hire a new colleague, simply feed standardized CV’s into an assessment software, and your laptop will spit out your best hire. No need to worry that substance didn’t matter, because your own department will be assessed through a similar algorithm that – claiming objectivity – ignores substance as well. I appreciate that modern metrics may be less biased with respect to individuals than old routines, which relied on old boys networks, prestige, etc. But I still have a nagging feeling that, as we put academia under the yoke of scientific quality measurement, something crucial gets lost.

So what is to be done? As with any indicator, it’s important to grasp the limitations of what scientific metrics measure, and to take them with a grain of salt. But that in itself won’t stop their corrosive effects on academia as a whole because we face a collective action problem. When everyone else bases their judgments on these metrics, individual dissent quickly looks foolish. It is therefore up to those in the higher echelons of the academic hierarchy to turn the tide, because they can afford to ignore conventions and lead by example.

Journal editors have a responsibility to protect their professional verdicts and integrity against the lure of IF-boosting tricks. More generally, academics have to hone their personal, substantive judgment of their colleagues and their work. That takes time, and it takes active resistance against ubiquitous biases, for example based on gender or PhD granting institution. Scholars – young ones in particular – will only put their heart into their work if they know that that is what the rest of academia rewards.