Tag: sabermetrics

New Year, New Writing Gig

Just wanted to let everyone know that starting January 4th I will be writing a weekly baseball column (sometimes twice weekly if I am feeling especially opinionated) at Beyond the Box Score.

Beyond the Box Score is a fantastic site, examining baseball from an analytical perspective.  The authors definitely embrace sabermetrics, but they don’t beat readers over the head with complex statistics.  As with most things that I do, the subject of my columns will vary quite a bit.

Generally speaking I’ll likely focus on team performance, player valuation, and lots of exploratory questions about the game.  Oh, and you can be sure there will be lots of pretty visuals and laments about the NY Mets.

Be sure to stop by if you are interested.  You can read and subscribe to my entries here, but I encourage you to subscribe to the site as a whole (RSS feed here).


PoliSci-unrelated post of the day: Visualizing Major League Baseball, 2001-2010

This post originally appeared at Beyond the Box Score.  If you are a baseball analysis fan and don’t already read BTBS I highly recommend it.

2010 marks the end of the “ought” decade for Major League Baseball.  I thought I would take the opportunity to analyze the last 10 years by visualizing team data.  I used Tableau Public to create the visualization and pulled team data from ESPN.com (on-field statistics) and USA Today (team payroll).

The data is visualized through three dashboards.  The first visualizes the relationship between run differential (RunDiff) and OPS differential (OPSDiff) as well as the cost per win for teams.  The second visualization looks at expected wins and actual wins through a scatter plot.  The size of each team’s bubble represents the absolute difference between their actual and expected wins.  Teams lying above the trend line were less lucky than their counterparts below the trend line.The final tab in the visualization presents relevant data in table form and can be sorted and filtered along a number of dimensions.

The first visualization lists all 30 teams and provides their RunDiff, OPSDiff, wins, and cost per win for 2001-2010.  The default view lists the averages per team over the past 10 years, but you can select a single year or range of years to examine averages over that time frame.  The visualization also allows users to filter by whether teams made the playoffs, were division winners or wild card qualifiers, won a championship, or were in the AL or NL.  The height of the bars corresponds to a team’s wins (or average wins a range of years).  The color of the bars corresponds to a team’s cost per win–the darker green the bar the more costly a win was for a team.  Total wins (or average for a range of years) is listed at the end of each bar.  In order to create the bar graph I normalized the run and OPS differentials data (added the absolute value of each score + 20) to make sure there were no negative values.  For the decade, run differential explained about 88% of the variation in wins and OPS differential explained about 89% of the variation in run differential.

The visualization illustrates the tight correlation between RunDiff and OPSDiff, as the respective bars for each team are generally equidistant from the center line creating an inverted V shape when sorted by RunDiff.  In terms of average wins over the decade, there are few surprises as the Yankees, Red Sox, Cardinals, Angels, and Braves round out the top 5.  However, St. Louis did a much better job at winning efficiently, as they paid less per win than the other winningest teams (<$1M per win).

(click for larger image)

The viz also illustrates the success of small market teams such as Oakland and Minnesota who both averaged roughly 88 wins while spending the 3rd and 4th least respectively per win.  If you filter the visualization for teams that averaged over 85 wins during the decade, it really drives home how impressive those two teams’ front offices have been at assembling winning ball clubs with lower payrolls.  No other team that averaged >85 wins paid less than $975K per win.  Oakland looks even more impressive when you isolate the data for years that teams qualified for the playoffs.  Oakland averaged 98.5 wins during seasons they made it to playoffs, and did so spending only $478K per win.

(click for larger image)
What about the big spenders?  The five biggest spenders included the Yankees, Red Sox, Mets, Dodgers, and Cubs.  The Yankees spent an astounding $1.8M per win during the decade, but they also averaged the most wins with 97.  Some will say this provides evidence that the Yankees–and other big market teams–simply buy wins and championships.  However, only 17% of the variation in wins was explained by payroll during the decade.  Moreover, while the Yankees occupied 6 of the top 10 spots in terms of cost per win they were the only team to earn a positive run differential.  The Cubs, Mets, Mariners and Tigers all finished under .500 and missed the playoffs while those Yankee teams qualified for the playoffs 5 out of 6 years and won one World Series.  Yes, the Yankees spend significantly more per win, but they spend more wisely than many other deep pocket teams.
Teams that made the playoffs averaged a little over $1M per win in those years they qualified, with Wild Card teams ($1.030M) spending a tad bit more than Division winners ($1.006M)–about $14K per win on average.  World Series winners spent $1.08M per win in their winning years compared to $1.002M for other playoff teams.  Teams that failed to make the playoffs averaged $923K per win.
The best team of the decade in terms of run differential?  The 2001 Seattle Mariners, who amassed an incredible +300 RunDiff.  Even with that total they were only expected to win 111 games–they would go on to win 116.  The Mariners had only the 11th highest payroll that year and so paid a measly $644K per win.  The absolute worst team of the decade?  The 2003 Detroit Tigers, who earned a RunDiff of -337 and actually won less games than expected (43 vs. 47).  Given their ineptitude on the field, the Tigers paid $1.14M per win even though their total payroll for the year was only $49M.
Luckiest team?  The 2005 Diamondbacks who won 77 games despite a RunDiff of -160 (only 64 expected wins).  Hardest luck team?  The 2006 Indians, who only won 78 games with a +88 RunDiff that should have translated into 90 wins.
(click for larger image)

There are tons of ways to manipulate the visualizations and cut the data.  Hopefully viewing the data in this way is helpful and illuminates some things we didn’t know and drives home other things we had a hunch about. This is my first attempt to visualize this data, so please feel free to send along any and all comments so I can improve it.

Author’s Note: Due to a very helpful comment by Joshua Maciel, I have updated the visualization.  Here is a link to the original version for those that are interested.


And the AL Cy Young Award Should Go To…

Time for a little baseball blogging.

There is quite a lot of buzz surrounding the AL Cy Young award this year. While there are a number of pitchers that possess a high number of wins (17, 18, 19, and even 20 games), there are many who believe the award should go to Seattle’s Felix Hernandex.  Despite only winning 13 games and losing 12, Hernandez’s performance this year has been nothing short of amazing.  His problem is that he played on one of the worst teams in the league.  He was 8th in the league amongst starters in terms of runs support (86 runs over 34 starts) and was actually dead last in terms of runs support per nine innings (3.1).  If you look beyond wins to the other two orthodox statistics that make up the pitching triple crown, Hernandez finished first in ERA (2.27) and second in strikeouts (233).  It is his performance in these other two categories that have many arguing for Hernandez to win the award, since he shouldn’t be penalized for his team’s lack of ability to score runs to support his dominance.

If someone like Hernandez wins this year it would truly represent a paradigm shift in the way baseball writers evaluate player performance.  In the history of the AL Cy Award, no starting pitcher has ever won with less than 16 victories (Zach Greinke won last year).  In the NL, only Fernando Valenzuela managed to win the award with as few as 13 wins, and that was in 1981, and no winner from either league had a record as close to .500 as Hernandez does.

That being said, I would actually argue that Hernandez is not the only “non-orthodox” contender.

There is only so much control a pitcher has over the outcome of a game.  And while starting pitchers have more control than most, they still must rely on their defense to play well and on their offense to score runs.  So rather than focus on statistics such as wins (which are heavily dependent on a team’s offense), we should evaluate starting pitchers on their performance independent of their offense and–to the extent possible–their defense. Doing this means focusing on how often hitters deny batters the chance to put the ball in play (strikeouts), how often they give a batter a free pass (walks), how many base runners they allow (WHIP), and how deep into a game they pitch, which gives their bullpen rest and allows their manager to use only the team’s best relievers (thereby, giving the team the best chance to win).

So let’s look at a few statistics:

K/9 – Strikeouts per 9 innings: The more batters a pitcher strikes out, the better.
K/BB – Strikeouts to Walk Ratio: The more strikeouts relative to walks, the better.
WHIP – Walks + Hits per Inning Pitched: The fewer baserunners a pitcher allows to reach base, the better.
FIP – Fielding Independent Pitching: Measures a pitchers performance independent of the quality of their defense.  Lower the better.
RS/9 – Run Support per 9 innings: How many runs a pitcher’s offense scores for them per nine innings.
IP/GS – Innings Pitched per Game Started: The more innings pitched per start, the better.

I’ve created a table with non-counting statistics for the top 10 pitchers in the AL this year, but I have not included their names or their traditional statistics (Wins, ERA, or K’s).  Take a look and think about who jumps out as the best pitcher:

Now, all of these guys are good, but there is one whose performance really jumps out.

First, it’s hard to miss the obvious gap between Pitcher A and their K/BB ration of 10.28 and the rest of the field.  For every 1 batter Pitcher A walks he also strikes out 10.  That is more than double the next closest pitcher (Pitcher B at 4.31).  That ratio of 10.28 is the second highest in the history of baseball and only the third time we’ve seen a double-digit ratio (the other other two times-1994 and 1884).  Pitcher A also had the lowest WHIP, the lowest FIP, and the highest IP/GS.  The only two areas he didn’t finish first is K/9 (10th) and RS/9 (4th fewest).

So who is Pitcher A?  Felix Hernandez?  Nope.  It’s Cliff Lee.

Here’s the chart with the names included:

In terms of the traditional statistics, Lee only went 12-9 with a 3.18 ERA (6th in the AL) and 185 strikeouts (10th in the AL) in 28 starts.  At first blush, his body of work doesn’t look that impressive.  But if you go beyond mere “counting” stats, Lee’s dominance becomes more evident and Hernandez-esque.  His higher ERA (still 6th best) can be explained by an unusually high .302 batting aver for balls in play (BABIP), meaning when batters actually managed to put the ball in play they reached based 1/3 of the time.  BABIP is strongly correlated to ERA.  My guess is that Lee’s high BABIP can be explained by the fact that the defense behind him wasn’t the greatest, reflected in the fact that he had the best fielding independent pitching in the AL amongst starters.

Hernandez had less run support (3.10 to 4.45) and more strikeouts per nine innings (8.36 to 7.84), but otherwise Lee was better than Hernandez in every non-counting category (and he was better than every other contender).

Will Lee win the AL Cy Young?  I doubt it.  My guess is it will either go to Hernandez or CC Sabathia (since he had 21 wins and played for the Yankees in the AL East), but it is hard to argue with how dominant he was over the course of the regular season.

[Cross-posted at Signal/Noise]


© 2021 Duck of Minerva

Theme by Anders NorenUp ↑