Thursday, April 24, 2008

Farnsworth Clutch?

David Appelman over at the Baseball Analysts seems to think so and has the math to prove it. Appelman took a look at various pitcher perfomances in high leverage index situations versus other situations. Leverage index is a way of measuring the importance of a particular situation based on inning, outs, score and runners. David then looked at pitchers' FIP or Fielding Independent Pitching, a statistic that just looks at things a pitcher controls (HR, BB, HBP, K) and eliminates those which involve his team's defense.

The gist of the article is that the author looked at FIP in high leverage situations compared to all other situations, and tallied up which pitchers did significantly better in high leverage or clutch situations. Appelman collected data over the last six seasons and found that Kyle Farnsworth was the 8th best clutch reliever.

It's a fun look at some wacky stats, and in the end is completely useless. Appelman himself noted that the data don't hold up from year to year, which tells me the basic principles of this measurement aren't valid. I don't care for FIP personally, and find it to be a wonky, selective statistic that overlooks what is really important here (things like base hits, doubles, sacrifice flies etc.).

David said he chose FIP because ERA didn't really work. This sounds very fishy. In what way did it not work? You didn't see the names you wanted popping up on the lists? This sounds like a case of the original methods not producing the results desired and so the author tried using different methods until he got the answers he wanted. In other words, selectively choosing data that will produce desired conclusion.

Addendum: David was kind enough to address my comments on the original website and also stopped by here to chime in. He basically explained that with his methodolgy, ERA wouldn't work because high leverage situations (particularly those with runners on base) generally produce more runs than low leverage situations (with no runners or fewer runners) and henceforth ERA would be elevated in those instances. This would skew results somewhat and produce bigger differences. The next question that comes to my mind then is are these differences significant (skewing the results for certain players) or would they be roughly the same for the players involved?

Eric Gagne is also one of the top relievers and I don't think anyone has accused him of being clutch last year.

By comparing high LI situations to all other situations, you don't factor in overall quality. The absolute worst pitcher in baseball could pitch significantly better in high LI situations, but still be worse than most other pitchers in those situations but would rank highly on David's chart.

Let me give a simple example of what I mean. Let's theoretically assign pitchers' abilities a score of 1-10, with 10 being the best.

-You're all star closer could perform at a level of 8 most of the time and a level of 10 in clutch situations, giving a difference of 2. With the methods used in the article, this value of 2 would be used to rank the pitcher.

-Joe average reliever pitches at a level of 5 most of the time and also delivers a 5 in clutch situations, giving a score of 0. This would rank below your all star (who has a 2).

-Now look at your horrendous reliever who normally pitches at a level of 1 but pitches at 4 in clutch situations, producing a difference of 3. Of all the pitchers, this guy would rank the highest with Appleman's methods, despite the fact that even at his best, he's still worse than anyone else.

Sabermetrics is an odd thing. There are a lot of fantastic insights into the game of baseball that have been uncovered by smart stat guys. Unfortunately, a lot of people just crank out stats, graphs and lists of numbers without thinking things through or putting much thought to the practicality of what they're saying.

This article is a fun read, as we can all look at the list, see guys like Farnsworth and Gagne on there, laugh, and say ha ha, I told you that guy was clutch, knowing full well there is little validity to the statement.

While I love sabermetrics, silly articles like this really give it a bad name if taken too seriously.

Addendum: David clarifies that the article was written from a perspective of "disproving clutch" and it has some utility in that respect. He and I differ greatly on our definition of "clutch", which is a big source of my frustration with some of this article's basic premises.

2 comments:

David said...

I explained why ERA didn't work in my comment on baseball analysts. It was not a matter of me not liking which names are on the list.

Your definition of clutch is just different than my definition, but I think we do come to the same conclusion in that you can say "I told you these guys were clutch", but it doesn't mean they'll be clutch at all next year.

To me, this was more of a disprove clutch article than it was a "yes these guys are without a doubt clutch and always will be" type of thing.

The numbers are fun to look at but they don't mean anything and that was the point of the article.

Jeff said...

David,

Thanks for stopping by and leaving a comment, I certainly appreciate it.

I was just heading back here to throw an addendum up to my post to include your explanation of why ERA doesn't work in your particular analysis.

You're article makes more sense to me from a "disprove clutch" point of view, but I didn't quite get that impression on my first read through.

I think we share the same ultimate conclusion that clutch is hard to measure and is a dubious notion at best.