A thought on Run Differential
Preface: I try to check fan post daily and haven't seen anything about this. If I missed it feel free to pile on and Al can delete the post.
Remember last season when our guys were struggling and there was a lot of talk about actual record vs. expected record based on run differential?
I was just looking over the expanded standings and the Cubs current run differential is +64. 234 runs scored(most in mlb by 14) and 170 runs allowed(least in nl). Think about that for a minute and let it sink in. Anyone remember the last time the Cubs have scored the most while allowing the least?
I don't know how the expected record is calculated based on run differential, but it got me to wondering what that would translate to. Anyone have a clue?
Even with all the "turmoil" of the Edmonds signing and all of the little things that some seem to find to complain about on a daily basis I am very excited about this team. More than I can remember being excited about in my 30 years as a fan. That being said, I realize that it is May and there is a lot of baseball left to play.
This is a FanPost and does not necessarily reflect the views of SB Nation or Al Yellon, managing editor (unless it's a FanPost posted by Al). FanPost opinions are valued expressions of opinion by passionate and knowledgeable baseball fans.
26 comments
|
0 recs |
Do you like this story?
Comments
CORRECTION!!!
“170 runs allowed(least in nl). “
That is actually second least in the nl to Atlanta.
"I'm not giving him a high-five ever again." - Sammy Sosa, joking about Moises Alou's personal habits
Was just about to post this...
A distant second to Atlanta’s 149. Nevertheless, I think we do have the largest run differential in MLB. It’s been a very impressive first month and a half of the season for sure.
And the Pythagorean calculation is
1/(1+(runs allowed/runs scored)^2) = expected win %
So in this metric, we’d be expected to have a 65.4% winning percentage, or a 26-14 record.
It's also listed on the mlb.com standings page...
In the “X W-L” column. Here’s the website:
The fine-tuned Pythagorean...
...is slightly different in formulation, but in this case leads to the same result.
Runs scored ^ 1.83
—-—-—-—-—-—-—-—-—-—-—-- = winning percentage
Runs scored ^ 1.83 + Runs allowed ^ 1.83
Winning percentage of 234 RS and 170 RA yields 64.2154248 winning percentage, good for 25.6861699 expected wins.
by John Q Freejazz on May 15, 2008 8:12 AM CDT reply actions
Interesting.
As good as the Cubs have been, this formula shows them as better.
That is, of course, skewed by two games, won by scores of 19-5 and 12-3. That’s a combined total of +23, 36% of the positive differential accounted for in two games out of 40.
Still, impressive.
"That's my opinion and if you don't like it, well, I have others." ~ Groucho Marx
Right...
the idea is that, over the course of a full season, those 19-5 and 12-3 type games will balance out into other games, minimizing the “outlierness” of those games. That’s certainly debatable, of course. But I wouldn’t read too much into the expected win totals this early, for the reason you mention (the outlier games skew the numbers substantially).
In either case though, we’re doing really really well.
True.
On losses, the biggest two losses have been 9-0 at Cincinnati and 9-2 to Cincinnati at home—so that’s a -16 in the two biggest blowout losses. Still +7 if you say those four (the two blowout wins above and these two) cancel each other out.
"That's my opinion and if you don't like it, well, I have others." ~ Groucho Marx
The solution to that is to simply run the Pythag...
...on the game level instead of the season level, and then take the average. A guy over at a Mariners blog does that for us. 97 wins is a BIT more reasonable than Pythagenpat’s results, which put us at an 103 win pace.
There’s other refinements you can use – Baseball Prospectus, for example, doesn’t use runs scored or runs allowed for some of their calculations, instead using Equivalent Runs on both sides – basically linear weights. (They also adjust for strength of schedule in their third-order wins, although I’m not sure of what mechanism they use.)
The BP approach is more interesting...
as runs can be somewhat flukey in nature. Especially on a smallish sample size.
by SouthernCub on May 15, 2008 10:06 AM CDT up reply actions
As Bill James is (overly) fond of saying, run scoring is not a linear process.
Using EqR like that is probably going to get things a bit wrong about the margins – you’ll tend to overrated bad offenses and underrate good offenses that way. To what extent and whether or not it matters is open to debate – I really don’t know. But I’d rather use a dynamic run estimator like BaseRuns for something like that.
(That said, I don’t want anyone to mistake me for disliking linear weights. I love linear weights. I would let linear weights marry my daughters. Linear weights are the most amazing things ever.)
math is awesome
i bet the girls were all over you in college
"I don't want to be a product of my environment. I want my environment to be a product of me." Frank Costello
by kalamazoo_cubs_fan on May 15, 2008 8:47 AM CDT up reply actions
Cubs organization
I’ve followed the Cubs since ‘89 and this is the strongest the organization has ever looked. I’m certainly more optimistic than I’ve been on the past regarding sustained winning baseball.
You're right
this is very good news and confirmation that our record isn’t fluky. The Cubs are one of the forces to be reckoned with in the NL.
Run Differential
Run Differential is a very useful tool when taking a look at how a team’s actual record compares to their “expected record”, however there are some limitations with run differential that need to be accounted for. The most important of which is understanding scheduling differences and “not all runs are created equal”.
This was the case last year when everyone was talking about the DBacks negative run differential vs. the Cubs positive run differential and how the Cubs therefore were the better team on paper. It was a silly argument, because in this day and age with unbalanced schedules teams from different divisions are playing against different levels of competition. The Cubs run differential last year against Non-NL Central opponents was basically EVEN, but because they tore up a very weak division it inflated their run differential a bit.
Again this has no impact on comparing actual record vs. expected record, which is the main purpose of this statistics. However when you start comparing across divisions, etc it gets a tiny bit hairier. You need to dig a bit deeper to understand the impacts of scheduling. The same can be said this early in the season when we have just 25% of the sample to look at and teams end up with wierd scheduling quircks (like the Cardinals absurdly easy early-season schedule loaded with PIT, SF, etc).
Just a few things to keep in mind when analyzing Run Differential
by DartmouthCubsFan on May 15, 2008 9:37 AM CDT reply actions
another oddity
read something somewhere (sorry can’t remember the source) that was talking about our pythag being a little skewed by those big games. but also a little screwy because we’ve yet to score exactly 4 or 5 runs in a game. which seemed extremely unlikely given how the season’s gone. strange that we haven’t scored 4 or 5 once…
I don't know how odd it is, really.
When the Cubs offense gets clicking, we really wear down opposing pitchers, get into the bullpen, and score runs in bunches. Once you’ve scored four or five runs and have gotten out to a comfortable lead, you face the trash pitchers and you just tack on.
Saw that at Baseball Musings as well.
"Baseball is like church- many attend, few understand." ~ Leo Durocher
2004 Pythagorean W-L Best Recently
The ‘04 team’s Pythagorean W-L record of 94-68 was actually better than that of the ‘07, ‘03, ‘98, ‘89, and ‘84 playoff teams. The ‘04 team’s real record of 89-73 was better than the real record of the “07 and ‘03 teams and worse than that of the ‘98, ‘89, and ‘84 teams. Run differential is a decent statistic, but I don’t put a tremendous amount of stock in it.
It's a decent statistic...
It tells you what you should be doing, all else equal. What it doesn’t account for is the quality of your bullpen. Teams that have really strong bullpens tend to consistently outperform their pythagorean stats, because they tend to do better in close games. The 2004 team had a poor closer.
One should also note that the 2004 team won more games than the ‘07, and ‘03 teams and won about as many games as the ‘98 team. So it wasn’t like that was a bad team. We just happened to play in a division with two other really good teams that year.
by SouthernCub on May 15, 2008 10:49 AM CDT up reply actions
It Isn't LaTroy Hawkins Proof
That ‘04 team ticked me off more than any other Cubs team because of LaTroy Hawkins and the collapse at the end. That team had enough talent to win the wild card.
If we want to get techinical...
...the exponent should be (Runs Per Game Scored + Runs Per Game Allowed)^.287.
If, you know, we wanted to get technical.
I don't use one standard expression
I use a composite distribution for the league that season, transform the distribution so predictive tools can be used to analayze the data, then curve fit.

by 


















