Filed under:

# Gears of War III: Evaluating Pitching

We talked last time about how we can calculate the number of wins based on the number of runs created or produced. If you missed it, the equation is pretty simple:

Wins ~ Runs/10.

The "~" sign there means "approximately," and that's an approximation because (amongst other things) the pitcher affects the run-scoring environment. Think of it this way: If all baseball games were won 1-0, then each run would be worth a win. If all baseball games were won 100-99, it would take a LOT of runs to increase your team's win total. So the number of runs/game affects the number of runs it takes to generate a win. Because a pitcher will have the ball in his hands for half the time he's in the game, he has a large influence on the run-scoring environment. In other words, good pitchers change the game so that each run becomes more valuable. This is part of the reason why managers are more likely to make sacrifices in games involving two great pitchers. Now that we know this, we can figure out how many wins a pitcher contributes to a team by calculating the number of runs they save compared to some baseline or compared to some other player, and calculating how many wins those prevented runs are worth. Follow me below the fold for more on how this works.

We use replacement-level production for the baseline in WAR (it's the "R" in WAR, Wins Above Replacement), with "replacement-level" being defined as the expected contribution a team could get from a readily available player making league-minimum. You can think of replacement-level players as the ones that "ride the Iowa-Chicago shuttle" as they get promoted when needed and demoted when their arm (or Lou's patience) is used up. We use this baseline because it represents the minimum investment the team could make to replace that player's innings. Now, there are particular cases where a team has replacements that are above or below this definition of "replacement-level." (Hopefully, Starlin Castro serves as an example of an "above replacement-level replacement.") When that is the case, you measure both players - the one "being replaced" and the one "doing the replacing" from the same baseline, so that when you take the difference between the two the baseline cancels out. When you are calculating the baseline you also make corrections for the environment in which the pitcher plays in, including their league, role, and ballpark. A pitcher that allows 4.0 runs/game pitching half his games at Great America Ballpark is doing a better job than a pitcher that allows 4.0 runs/game for pitching half his games at Wrigley Field, because more runs are scored at Great America Ballpark. Likewise, a pitcher that allows 4.0 runs/game in the American League is doing a better job than a pitcher that allows 4.0 runs/game in the National League, both because the American League is better AND because the run-scoring environment is higher. Finally, a starter that allows 4.0 runs/game is considered more valuable than a reliever that allows 4.0 runs game, because the starter has to pitch more innings, which means a larger repertoire of pitches and the need to "save himself" for later innings and tense situations. Thus, "replacement-level" means different things for different pitchers, depending on their role and their environment.

We also have to decide what metric to use when evaluating the pitcher's performance. Since a pitcher's job is to prevent runs, usually some metric is used that measures runs allowed compared to the replacement-level baseline discussed above. But we don't normally use ERA or even RA (total runs allowed/9 IP). Why not? Well, we're trying to get a handle on the pitcher's ability to prevent runs, not the team the pitcher is playing on. In other words, we want to isolate the pitcher's ability to prevent runs from the run-preventing capabilities of his defense. To do this, we use a metric called FIP (Fielding Independent Pitching) that translates the allowed home runs, strikeouts, and walks a into a runs-allowed statistic. By limiting ourselves to plays that the defense is NOT involved in, we can do a decent job of isolating the pitcher's run-preventing abilities. You may also want to account for good or bad luck in the number of home runs the pitcher has allowed given their fly ball rate. For this, we use a metric called xFIP, ("expected" FIP). An example of a pitcher whose xFIP was much lower than his FIP was Rich Harden during the first half of 2009. Based on that, you would have expected him to have improved in the 2nd half of the season... just as he did. (Quick preview of next time: Carlos Zambrano's FIP is much higher than his xFIP. Carlos Silva's FIP is much lower than his xFIP. Guess which one we can expect to get worse as we go forward? Hint: THE ONE GETTING MORE INNINGS.)

Got all that? If not, I'll break it down into steps:

1.) pick your metric. Usually, this is FIP or xFIP

2.) calculate your baseline, based on the environment the pitcher is in and their role (reliever or starter)

3.) calculate their runs allowed, subtract the baseline runs allowed, and divide by the number of runs/win for the given run environment. That'll give you the player's WAR. If you want to, this is also the appropriate time to apply leverage for relievers (we'll discuss this next time).

4.) go on the interwebs and complain that the Cubs are severely under-utilizing their 2nd-best pitcher

If you're still not getting it, my next post may help. Next time I will discuss the Cubs' recent, panicked move of Zambrano to the bullpen as an example of how to apply pitching WAR to real-world scenarios. I hope to show you all why many of us flipped out at this, and why I called it "cosmically stupid." I also hope to show how a move of Zambrano to the bullpen could actually make the team better. (Hint: Basically, the opposite of the way they're doing it!) Stay tuned...