Tag: statistics (Page 2 of 2)

The External Validity of Terrorism Studies on Israel/Palestine

The growing desire to understand both the rationality of suicide terrorism, as well as test theoretical concepts empirically has generated several interesting political economic studies of terrorism. As such, a recent paper in the NBER caught my eye for several reasons. The article, entitled “The Economic Cost of Harboring Terrorism,” adds to this body of work by focusing on an area that has yet to be explored. Very often the question of interest in these studies is, “how do terrorist attacks affect the target economy?” In this paper the authors reverse the question and ponder, “how do terrorist attacks affect the economic conditions of the area from whence the attack came?”

The question is a very good one, and the authors investigate it with a unique data set:

Our analysis overcomes these difficulties by relying on a detailed data set of suicide terror attacks and local economic conditions together with a unique empirical strategy. The available data set covers the universe of suicide Palestinian terrorists during the second Palestinian uprising, combined with quarterly data from the Palestinian Labor Force Survey on districts’ economic and demographic characteristics, and Israeli security measures (curfews and Israeli induced Palestinian fatalities).

The punchline…

…a successful attack causes an immediate increase of 5.3 percent in the unemployment rate of an average Palestinian district (relative to the average unemployment rate), and causes an increase of more than 20 percent in the likelihood that the district’s average wage falls in the quarter following an attack. Finally, a successful attack reduces the number of Palestinians working in Israel by 6.7 percent relative to its mean. Importantly, these economic effects persist for at least two quarters after the attack.

While I think this paper introduces a very important research paradigm, I have a concerns with some of the technical assumptions built into their analysis, and the overarching reliability of research focusing exclusively on terrorism in the Israel/Palestine conflict. With respect to the technical assumptions there is one line in the paper that struck me as very problematic: “Our empirical strategy exploits the inherit randomness in the success or failure of suicide terror attacks as a source of exogenous variation to investigate the effects of terrorism on the perpetrators economic conditions.”

I find it very difficult to accept the notion that success and failure is random across suicide attacks—especially within this particular conflict. There is clearly no support for a theory that selection of suicide attack sites is random; therefore, it follows that the success of an attack would also be a function of both the selected target as well as the learning process occurring by both the attackers and defenders. There is, therefore, an expectation of high autocorrelation across success for attacks happening within a relatively small geographic area. Such difficulties highlight the general problem of external validity for terrorism studies that focus solely on the Israel/Palestine conflict.

It is not surprising that researchers often default to data on terrorist attacks from this conflict. Given the relative openness of Israel’s democratic government, the media attention on Palestine, and the—unfortunate—frequency of attacks there exists are large amount of data from this conflict. As I have mentioned before, however, it is very difficult to infer causality from this data given the natural interconnectedness of the conflict dynamics. As I mentioned, there any large-N study of terrorism in this context has enormous selection problems, as terrorists learn innovate to evade the defensive tactics of the ISF, and the Israelis create new policies that may provoke and dissuade the terrorist activities. There are no other ongoing low-intensity conflicts that have issues at this level, making it difficult to draw parallels between findings from research focusing and Israel and Palestine and another other conflict.

I am curious as to others’ thoughts on this issue of external validity, and welcome your comments.

Photo: Norman G. Finkelstein


Post hoc ergo propter hoc

I am usually a fan of Charles Blow’s work, but his latest op-ed seems to me a bit sloppy.

Blow claims that one reason Democrats, and President Obama in particular, may be having trouble convincing the country to sign on to large-scale health care reform is due to the public’s overall lack of trust in the government. This is a completely plausible hypothesis and one that I agree with, as the numbers regarding trust are incredibly low right now (~20%). What I take issue with is the way Blow points out a “peculiar quirk of recent American politics”; namely, that American’s trust in government has generally been lower following the election of a Democrat to the White House and higher after electing a Republican. Blow does not say that the Democratic administrations caused the decline in public trust numbers, but he might as well have given how the short piece is written.

Is it possible? Sure. But given the data and graphic he provides there are all sorts of reasons to doubt it is the case. At the very least, if he is going to imply such a causal relationship he should have provided a bit more discussion. Simply because low trust numbers followed the election of Democratic Presidents doesn’t imply causation.

The first problem is one of time: the data he bases his discussion on only goes back to 1976. Truncating the sample in this way gives us no perspective on whether this is an artifact of the data or whether it represents an actual pattern. To be fair, Blow no choice–the data is what it is. But the time frame distorts the possibility that the party affiliation of the President doesn’t matter.

Second, Blow gives us nothing to compare the data against in terms of control or alternative variables. Level of trust in government can be caused by numerous factors, including perceptions of Congress, bureaucracy, economic environment and trends, wars and foreign conflict, whether the country is moving in the right direction, etc.

Third, trust is built on repeated observation–people build up an image of whether someone or something is trustworthy based on past performance. That means feelings of trust take time to form and time to change. Additionally, the question asks about the government, not the President. In the United States, the term government has a broad meaning, unlike in parliamentary systems where it focuses on the ruling party. Given that, it is possible that any feeling of trust/distrust is dependent on both previous periods and the wider apparatus of government. We should be paying more attention to the general mood of the country prior to elections than on a single data point after a new President takes office.

Just to play around I collected data on the question of direction from the same poll that Blow pulls the trust data and graphed it side by side. The idea is that trust and feelings about the direction are likely related and do not move in lock step with single elections. Not surprisingly, there is a good fit between whether respondents see the government as trustworthy and whether they think the country is headed in the right direction (Correlation of Right direction and Always/Mostly trust is .8 and Wrong Direction and Some/Never trust is .83).

Moreover, if we map the elections of the last three Presidents on to the graph we notice something interesting.

Each President came to office after a long trend of either increasing or decreasing trust. For Clinton and Bush, this trend continued well into their first year in office. For Clinton, the trust and direction numbers began to turn upwards midway into his second year in office. For Bush, both sets of measures decreased after March of 2002. Obama took office after having watched the trust measure decrease from 55% to 17%. It took over 1 year to see the trust/direction numbers reverse during both the Clinton and Bush presidencies, so it is not surprising that we’ve only seen a slight up-tick in trust (+3%) during Obama’s first year in office. (Although it is interesting that the right direction measure has jumped since the recent election from 11% to 44% in only the first 8 months.)

Bottom line, Blow is right to point out that a massive change in a critical social good like health care is going to require trust on the part of the public. However, the peculiar quirk seems more a function of the timing of elections and less about the causal impact of a newly elected President.

[Cross-posted at bill | petti]


The Mathematics of War, Revisited

A few months back I wrote a post discussing Sean Gourley’s TED talk on the Mathematics of War; specifically, noting that his finding (a power-law distribution of attack frequency and severity in Iraq) was—well—old news. This set off an excellent discussion on Sean’s work, my comments, and more generally how the social and hard sciences can clash. More recently, Tom Ricks of The Best Defense blog revisited Sean’s talk with his own skepticism, which induced a response from Sean, and further skepticism by Ricks. In defense of his work, Sean responded to Tom’s post with the following:

With this new approach we can do several important things that were not possible before. We can understand the underlying structure of an insurgency i.e. how an insurgency ‘decides’ to distribute its forces (weapons, people, money etc). Further, we can explain why this kind of insurgent structure emerges in multiple different conflict zones around the world. We can estimate the number of autonomous insurgent groups operating within a theatre of war. We can monitor and track a conflict through time to see how either sides strategies are affecting the state of the war. Finally we can compare the mathematical patterns of current ongoing wars with past wars to estimate how close they are to ending.

I think Sean’s work in extremely important, as in many ways our research interests run parallel and this project has great potential. That said, his response leaves me with more questions than answers, therefore, with Sean’s response in hand I would like to revisit the mathematics of war.

First, I have serious doubts as to the connection between the distribution of attack frequency and severity and the underlying structure of an insurgency. Power-law distributions can provide a categorical approximation of a network’s underlying structure because in these cases the distribution in question refers to the frequency of edge counts among nodes, a structural measurement. Even for networks, however, the actual underlying structures of networks following a power-law can vary wildly. Attack frequencies, on the other hand, have nothing to do with structure. In what way, then, is this metric valid for measuring the structure or distribution of insurgent forces?

There is also a large element of context that is not captured in this analysis. To get to Sean’s question on why different types of insurgencies occur in different parts of the world, with varying lethality and effectiveness, one must account for the inherent variance in ability among insurgents and insurgent organizations. We know that people vary in their abilities to perform any task, which of course includes insurgency; therefore, we must control for any exogenous or endogenous factors that could contribute to this variance as to avoid inserting into our analysis the belief that all insurgent are created equally. Once a reasonable number of theoretically justifiable control variables are identified, we may be able to get at this question at both a micro (insurgent) and macro (insurgency) level. A present, the data used in Sean’s analysis accounts for this variation.

Next, there has been quite a bit of research on the duration of wars, including state-on-state, civil and insurgency. For this research, a critical hurdle has always been how to overcome bias in the data collection and reporting when attempting to approximate how various factor contribute to the curation of a conflict. Sean uses open-scource media accounts of attacks to develop his data, and because most of these media outlets are primarily motivated by profit it is difficult to view this data as unbiased. This problem, however, can be dealt with by various sampling techniques and control varaibles. Of greater concern are the eventual conclusions drawn by attempting to match conflict patterns in this manner. With Sean’s data, we might ask what factors contribute to ending conflicts following a power-law. Unfortunately, as previously discussed, all manner of conflicts follow this pattern. If two conflicts have a near identical power-law distribution when observed in the long term, but upon examination we find that one is an insurgency and other a state-on-state conflict, what insight have we gained? This categorical approach, therefore, may be significantly limited in its explanatory value.

Finally, I must point out that I have a very superficial perspective on Sean’s work, as I have only been exposed to the TED talk, and the discussions that have followed from it. There are likely many elements of this research that I am missing, and as such all of the above concerns may have already been addressed. I am interested in your take on Sean’s response, my position, and where you see the value in this research? To quote Tom, “Smart, statistically-comfortable readers: Do you see support for these claims?”

Photo: Chart of distribution of attacks with magnitude from “Variation of the Frequency of Fatal Quarrels with Magnitude,” by Lewis F. Richardson.


Nate Silver on Iranian Elections

Nate analyzes some statistical analysis claiming to show fraud in Friday’s elections. Short story, he is not buying the analysis (as the results are mostly an artifact of how the numbers were released), nor is he discounting the possibility that it occurred (which, I would say, is more than reasonable).

Rob Farley links to a nice summary regarding what’s in the air regarding the dimensions of the ‘political coup’. Part of this, if true, shows an attempt to cover up data that would more than suggest outright fraud.

Still waiting for a reliable statistical model of estimated electoral fraud. Even if we had one, it appears the authorities in Iran are intent on withholding the inputs necessary for such a model.


(Baseball) Statistics Never Lie

[cross-posted at ProfPTJ’s Course Diaries]

One of the things that I find the most fascinating about the game of baseball is the fact that statistical data about the performance of players and teams is actually meaningful. This is so largely because the kinds of things being measured — whether a team wins or loses, how many balls and strikes a pitcher throws, what percentage of the time a player gets on base as opposed to making an out — involve a repetition of the same basic actions a sufficient number of times that random fluctuations cancel out. Players do the same basic things enough times that over the course of a season their ability to do those things (like get the bat on the ball, throw a strike, and so forth) will be reflected in their numbers.

For example, every “plate appearance” that a batter has over the course of a season is roughly similar to every other plate appearance in its basic contours, and over the course of a 162-game season the average player can expect to come to the plate about 500 times — a sufficiently “large n” that saying that a batter has an on-base percentage of .446 and a slugging percentage of .536 is a meaningful statement.1 Contrast this to “football statistics,” which are based on a regular season of only 16 games; players rarely get sufficient chances to do things like catch passes and rush for yardage to make meaningful quantitative comparisons possible. This doesn’t stop people from making those comparisons, and playing “fantasy football” based on them, but I’ll keep my statistics operating in realms where they make some sense, thank you very much.

The fact that baseball statistics are meaningful allows observers of the game to conduct very precise analyses of how well their teams and players are doing. Just for kicks, this morning I plugged some numbers into a spreadsheet to make a rudimentary calculation not about which teams were doing the best in terms of wins and losses — that information is readily available in any major newspaper, and all over the web (for instance, here) — but about which teams were performing most efficiently. I took information about the 2005 payrolls of all 30 major league teams from this site, and had Excel calculate approximately how much money each team was paying for each of the wins it had thus far achieved this season.2

The results are interesting, although I won’t bore you with all of the details. The important results are these:

  • the Yankees have the most expensive wins, at $2,571,689.10 per win; the Devil Rays have the cheapest, at $498,447.13. This is not a major surprise, since the Yankees’ overall payroll is about seven times as large as the Devil Rays’ payroll.
  • what is surprising is that the Yankees are paying almost twice as much for a win as the next team in the list, the Boston Red Sox. And the Red Sox are doing better than the Yankees in the overall win-loss standings.
  • of the teams whose winning percentage is .500 or greater, the three teams paying the least for their wins thus far this season are the Toronto Blue Jays ($535,243.19), the Washington Nationals ($568,748.94), and the Minnesota Twins ($574,432.48). The payrolls for all three teams are in the bottom third of all major league teams.

What this tells me is not just the the Yankees are playing crappy baseball this year — that much I knew already — but precisely how crappy their play has been. It also underscores precisely how impressively the Washington Nationals have been playing; they’re making a little bit of salary go a very long way.

The analysis also demonstrates, pretty concretely, that simply spending money on a baseball team doesn’t guarantee you success. You also have to use that salary efficiently, and get sufficient bang for your buck. The Yankees are spending about $208 million and thus far have a 27-27 win-loss record; the Nationals’ total payroll is about $48.5 million, and their record is 29-26. Put another way, the Nationals are spending 1.171% of their salary for each win, while the Yankees are spending 1.235%; the difference may not look like much, but over a 162-game season, minor variations between teams and players translate into major differences.

The fascinating thing is that in baseball we can determine precisely how much difference these things make. I’d be very resistant to running numbers like this in most other situations, but in baseball, bring on the quantitative analysis!

1 On-base percentage measures how often a batter reaches base successfully, whether through getting a hit or drawing a walk or being hit by a pitch; slugging percentage measures the total number of bases that a player reaches in all of his at bats; details on basic baseball stats can be found here. The specific numbers that I used for this example are Nick Johnson’s stats for the present season. The “500 at-bats” figure is approximately the minimum number of at-bats required to qualify for a batting title under the present 162-game regular-season schedule. 2 We’re approximately 1/3 of the way through the season at the moment, so if we divide each team’s salary by 3 and then divide that number by the number of wins, we get the effective “price per win” that each team is paying — note that this does not include the salaries for managers, coaches, etc., but is limited to the combined salaries of all the players on the team’s payroll.

Newer posts »

© 2021 Duck of Minerva

Theme by Anders NorenUp ↑