Prior to last night’s much needed offensive outburst against the Dodgers the Cubs were in a mini-slump after dropping five games in a row, including a four-game sweep by the last place Cincinnati Reds. I don’t really see any reason to relive all of the baseball from the last week, but it was a good reminder that there is a difference between bad baseball and bad luck so I wanted to use it as an opportunity to talk a bit about statistical regression and what stats can give us clues about when a player might be more likely to regress.
Let’s start with a definition because if you were to just google “statistical regression” you would get a really fancy model that isn’t what I wanted to talk about today at all. What I want to talk about is a concept known as regression to the mean that often gets shorthanded as regression. As usual, Fangraphs has us covered:
Regression toward the mean (RTM for clarity in this article) is the concept that any given sample of data from a larger population (think April stats) may not be perfectly in line with the underlying average (think true talent/career stats), but that going forward you would expect the next sample to be closer to the underlying average than the first sample. Observations tend to cluster around the average value, even if the previous value is unusual.
In other words, statistics tend to normalize over the course of a lot of plate appearances so it’s worth it to wait out really bad stretches and be cautious about really good stretches, because eventually players are likely to return to their career averages. Obviously some players do make meaningful improvements over the course of their career (see Daniel Murphy ’s evolution) but as a general rule of thumb if the underlying mechanics are similar to what they’ve always been the concept of regression to a career mean is probably a more likely expectation.This is particularly true for more established players that have a decent amount of data behind them.
One other note, a regression can be positive or negative, even though we colloquially use that word with a negative connotation. A player who is slumping that has multiple solid years behind them is likely due for a positive regression, while a player who is sporting a minuscule ERA that isn’t in line with their career average is probably due for a negative regression.
With that in mind I wanted to take a look at the pitching statistics that can indicate a regression might be in a player’s future and see if there are any candidates for regressions on the Cubs at the moment. I’ll do a similar look at hitters in the near future.
The most obvious regression candidate for the Cubs was a pretty big contributor to the recent five game losing streak. I’m looking at you, dancing bullpen.
I knew it was inevitable, I didn’t think it would all happen at once or be quite so obvious. I noted a few of these as part of my look at the Cubs bullpen last month. Let’s take a closer look at the most obvious example: Brian Duensing.
At the time I wrote that piece Duensing had yet to give up a run. Considering Duensing’s career ERA is 4.09 it was unreasonable to expect him to never give up a run again. That 0.00 ERA blew up and Duensing has posted an ERA of 21.00 so far in June. The good news is that after that spell, his season-long ERA is sitting at 6.85, so he’s probably in for positive regression in the near future.
Career stats weren’t the only clue that Duensing was due for a regression, however. Some stats, particularly BABIP and FIP, are really good indicators that a player is over- or under-performing. To illustrate this I’m going to repost the bullpen stats chart from the previous article (which went through 5/10) and then post an updated version with stats for June:
Cubs bullpen stats through May 10
|Carl Edwards Jr.||16||17||15.88||4.24||0.00||.250||93.3%||32.1%||0.0%||0.53||1.05||2.47||0.8||$594,000|
Cubs bullpen stats in June
The first thing to note is that if I limited this to qualified pitchers in June only Justin Wilson, Steve Cishek and Pedro Strop would make the cut. As you can see from all the additional names, the Iowa shuttle has been very busy. Part of that is due to the schedule, part of it is due to players being on the DL.
More interesting for purposes of looking at regression, though, is that the pitchers who have struggled recently, most notably Duensing and Pedro Strop, were both sporting ERAs in early May that didn’t align with their Fielding Independent Pitching, or FIP. FIP is supposed to give us an idea of what a pitcher’s ERA would be without intervening factors like the strength of the defense behind them. If an ERA and FIP are particularly out of alignment, it could mean there is a regression in their future. In Duensing’s case there was also an unsustainably low Batting Average on Balls In Play (BABIP) of .194 propping up his numbers. BABIP for a pitcher should hover around their league or career average, and in Duensing’s case that career average is .303, which is a far cry from .194.
In terms of predictions going forward Strop, Duensing, Wilson and Morrow are all probably due for a string of better luck. However it looks like the Cubs have gotten more than a little bit lucky with the Iowa shuttle. Cory Mazzoni, Rob Zastryzny, Randy Rosario, and Justin Hancock are all overperforming their FIP in sizable ways. To a lesser extent, Anthony Bass and Luke Farrell are overperforming as well.
The same stats can be illustrative for starters, so let’s take a look at these numbers for the Cub’s rotation:
Cubs starting rotation stats
I don’t want to be the bearer of bad news here, but Jon Lester looks like a prime candidate for regression here. His career BABIP is .295 which is substantially better than the .234 he’s sported in 2018 so far. His career best BABIP of .256 was in 2016, a year that also saw the Cubs historically good defense. Additionally, he’s overperforming his FIP by almost 2 full runs. Lester is a pretty savvy veteran, but it wouldn’t be terribly surprising for some of these numbers to regress.
While I’ve been as big of a fan of Mike Montgomery being in the rotation as anyone, his numbers also seem to indicate they could be in for some regression. His .208 BABIP is particularly concerning. Kyle Hendricks also looks like he could be in for a bit of a regression as his ERA is almost a full point lower than his FIP.
The other starters appear to be performing as their stats would indicate, although in the case of Yu Darvish and Jose Quintana their career numbers would indicate that they are due for a better stretch any day now. Darvish has a career ERA of 3.49 and Quintana’s is 3.58. In both cases they amassed those numbers in the American League where pitchers generally have worse ERAs due to the designated hitter.
As with most statistical analysis there is some good news and some bad news here for Cubs fans. All of it tends to even out over the course of 162 games, however, and with any luck these regressions can stagger themselves so the team doesn’t repeat that five game losing streak this season.