Methodology Lessons: DOE’s Natural-gas Overstatement

7 April 2010, 1150 EDT

[Cross-posted at Signal/Noise]

The Wall Street Journal reported yesterday that the US Department of Energy is set to restate the data it collects on U.S. natural-gas production. The reason? The Department has learned that its methodology is seriously flawed:

The monthly gas-production data, known as the 914 report, is used by the industry and analysts as guide for everything from making capital investments to predicting future natural-gas prices and stock recommendations. But the Energy Information Administration (EIA), the statistical unit of the Energy Department, has uncovered a fundamental problem in the way it collects the data from producers across the country—it surveys only large producers and extrapolates its findings across the industry. That means it doesn’t reflect swings in production from hundreds of smaller producers. The EIA plans to change its methodology this month, resulting in “significant” downward revision.

The gap in output between what the 914 report has been predicting and what is actually occurring has been growing larger and larger. Many analysts have long suspected the methodology underlying the reports was faulty, but the EIA has been slow to revise it. The overestimation of output has depressed prices, the lowest in 7 years. Any revision to the methodology will bring about a “correction” in energy markets and particular states will surely see their output dip significantly.

So what can we learn from this from a methodological perspective? A few things:

  1. How you cast the die matters: The research methodology that we employ for a given problem significantly impacts the results we see and, therefore, the conclusions we draw about the world. The problem with the DOE’s 914 report wasn’t simply a matter of a bad statistical model, it was the result of unrepresentative data (i.e. relying only on the large producers). This isn’t simply an issue of noisy or bad data, but of systemic bias as a result of the methodology employed by the EIA. The data itself is seemingly reliable. The problem lies with the validity of the results, caused by the decision to systematically exclude small producers and potentially influential observations from the model.
  2. Representativeness of data doesn’t necessarily increase with the volume of data: More than likely the thought went that if the EIA collected data on the largest producers they’re extrapolations about the wider market would be sound–or close enough–since the largest players tend to account for the bulk of production. However, as we see with the current case, this isn’t necessarily true. At some point in history this methodology may have been sound, but it appears that changes to the industry (technology, etc) and the increased importance of smaller companies have rendered the old methodology obsolete. Notice that the EIA’s results are probably statistically significant, but achieving significance really isn’t that difficult once your sample size gets large enough. What is more important is representativeness–is the sample you’ve captured representative of the larger population? Many assume that size and representation are tightly correlated–this is an assumption that should always be questioned and, more importantly, verified before relying on the conclusions of research.
  3. Hypothesis-check your model’s output: The WSJ article notes that a number of independent analysts long suspected a problem with the 914 reports by noticing discrepancies in related data. For example, the 914 report claimed that production increased 4% in 2009. This was despite a 60% decline in onshore gas rigs. If the 914 report is correct, would we expect to see such a sharp decline in rigs? Is this logically consistent? What else could have caused the 4% increase? The idea here is to draw various hypotheses about the world assuming your conclusions are accurate and test them–try to determine, beyond your own data and model, whether your conclusions are plausible. Too often I’ve found that business fail to do this (possibly because of time constraints and less of a focus on rigor), but academics often fall into the same trap.