SB Nation - Login for mobile commenting

Beyond the Box Score

What Starting Pitcher Metrics Correlate Year-to-Year?

Tim Hudson has the highest GB/FB ratio in the majors since 2008 (2.63).

Greg Fiume - Getty Images

Tim Hudson has the highest GB/FB ratio in the majors since 2008 (2.63).

As a follow up to my previous article on hitting metrics, I wanted to take a look at those pitching metrics that correlate year-to-year. For this installment, I looked at starting pitchers from 2004-2011 with at least 162 innings pitched in year one and year two.

As before, this is just a straightforward correlative analysis--nothing fancy. I took a look at a bevy of metrics (courtesy of the fine, upstanding citizens at FanGraphs), and here are the results:

Pitcher repertoire generally has the highest correlation, year-to-year (Y2Y). The distribution of their pitches (i.e. four-seam fastball, cutter, change up, etc.) shows great consistency from one year to the next. Now, there are potentially coding errors in that data, but the consistency of those statistics reflects what I think is generally known--that once a pitcher makes it to the big leagues as a starter they rarely alter their portfolio of pitches. What they likely alter, more regularly, is speed, sequence, and location. But that's just a hypothesis, one that can't be confirmed or rejected with this data.

Moving on.

Star-divide

Outside of repertoire, the highest correlated statistic for starters is the ratio of ground balls to fly balls they throw (GB/FB), followed closely by K%. Again, this is consistent with previous research that looked into what factors a pitcher generally controls (e.g. Tom Tango and FIP, Matt Swartz and SIERA). We can see that strikeouts (and metrics associated with strikeouts such as Swinging Strike %, Contact %, and Outside of Zone Swing and Contact %), walks, and batted balls outside of line drives are all correlated Y2Y at least .67 or higher.

The highest correlated ERA estimator was SIERA (.72), followed by xFIP (.68).

As before, I also put together a correlation matrix for all the year one metrics and the year two metrics. Those correlations between .40 and .69 are shaded blue, and correlations above .70 are shaded green.

Scrolling left to right we can quickly see what metrics correlate strongly with, say, next year's Earned Run Average (ERA). ERA itself has a Y2Y correlation of .38. True ERA (tERA) came in at .47, the highest of all the ERA estimators. Fielding Independent Pitching (FIP) had a correlation of .46, followed by SIERA .45 and xFIP .43.

Another interesting finding relates to Win Propability Added (WPA). The most predictive statistics in terms of whether starters will have higher WPA are those related to strikeouts. Again, this jives with what people have long suggested--the ability to miss bats is key and something that pitcher's inherently control to a large degree.

Finally, to further emphasize the point that a starting pitcher's record is not the best way to evaluate their performance, let's look at run support per nine innings (RS/9). The Y2Y correlation of a pitcher's run support is a mere .16. With Wins having a correlation of only .29, it's no surprise.

So, as with hitters, it pays to focus on independent pitcher metrics like SIERA and FIP when trying to get a read on a hurler's true performance and likely performance in the next year. And, like hitters, focusing on how much a pitcher misses bats, gets swings on less hittable balls, and commands the zone is a solid bet as these attributes are some of the most related year-to-year. When we see big changes in these types of metrics it should be a red flag that something might be happening (positive or negative) with a pitcher.

(Special thanks to Matt Swartz for working through some data issues with me)

0 recs  |  16 comments

Comments

Great work. So, as someone who doesn't follow the developments in ERA estimators THAT closely…

this isn’t the first time I’ve seen SIERA rate best. Why hasn’t that gained the traction of a FIP or xFIP?

Well, there has been lot's of debate about SIERA

I will say this: the analysis above shows that it has the highest Y2Y correlation with itself, but other ERA estimators have a higher Y2Y correlation with ERA.

Also, as far as estimating next year’s ERA, they are all within .01 points of each other, more or less. FIP and xFIP are absurdly easy to calculate compared to SIERA and tERA, so maybe that’s part of what’s driving adoption.

Cool, thanks.

There’s a lot to be said for simplicity.

this is wrong:
I will say this: the analysis above shows that it has the highest Y2Y correlation with itself, but other ERA estimators have a higher Y2Y correlation with ERA.
Momentum

Franky, I think it’s more of a momentum thing. SIERA does relatively better with relievers compared to other ERA estimators than it does with starters, so it does have a distinctly higher correlation when SP & RP are both in there. It’s also got a better RMSE too.

I’ve never really understood the issue with it being complicated to calculate. I’ve never calculated an xFIP myself on the fly either. I just go to FanGraphs. The Markov matricies that generated the 3,2,13 coefficients in FIP are easily more complicated math than the multiple regression techniques used in SIERA too, so I think it’s really a matter of xFIP coming first.

I think you right about first mover, Matt

As for ease of calculating, maybe it’s my lack of exposure, but I feel like most find 3 stats with 3 multipliers relatively easy. Moreover, people might feel like the have a better sense of what’s ‘inside’ as a result.

Did you republish the equation after updating? The only reference I have is this: http://www.baseballprospectus.com/glossary/index.php?search=SIERA

Clutch Y2Y correlation = 0

Jack Morris’ face when

I demand more Looney Tune reaction shots in our comments section!
Excellent work

Excellent job Bill. I can see that this was a huge amount of work, but the results are valuable. It’s interesting that k/9 is as predictive of next year ERA as FIP.

Very interesting. Looking at the correlation matrix, Y1 SIERA/Y2 ERA comes in at 0.45, and Y1 K%/Y2 ERA comes in at -0.45.

Curious

What Siera formula was used in this test? Was it a standard formula or did the constants change for each year?

You must Login with your SB Nation account and be a member of Beyond the Box Score to post a comment.