lineup optimization using Cubs 2008 ZiPS projections
Back in 2006, I got it into my head to try my hand at writing a simulator for the express purpose of optimizing lineups. It would go through all 9! permutations of the lineup and actually play simulated games, and then output the average runs scored for each lineup. You can see the original post here and the follow-up using stats taken from the 2005 Astros here. I never did get around to updating the algorithm (although it's still on my to-do list), but since I finally got myself a new computer I decided to try something that my friend suggested, which was to run a set of projections through the program in order to predict what the optimum lineup would be for the forthcoming year.
The program finally finished the run (marginally faster - about 50 hours run time) using the 2008 ZiPS projections for the Cubs. I used the projections for the following eight players: Lee, Ramirez, Fukudome, Soto, Pie, DeRosa, Theriot, and Soriano; and then summed up the actual 2007 lines for the five pitchers with the most ABs (Z, Lilly, Hill, Marshall, Marquis) to make a sort of "average" pitcher. I'd like to reiterate that this algorithm doesn't take into account errors, steals, player speed, GIDP, handedness, sacrifice flies or sac bunts, and baserunning is strictly station-to-station. The algorithm also does not take into account any other strategy - i.e., it doesn't pitch around guys to get to the pitcher. In addition, ZiPS doesn't project hit-by-pitches.
That said, here are the results for the top 10 and bottom 10 lineups, as well as the results for the starting lineup from the March 18th spring training game.
| Runs/season | Runs/9 | 1st | 2nd | 3rd | 4th | 5th | 6th | 7th | 8th | 9th |
| 596.248 | 3.681 | Theriot | Soriano | Lee | Ramirez | Fukudome | DeRosa | Soto | Pie | pitcher |
| Runs/season | Runs/9 | 1st | 2nd | 3rd | 4th | 5th | 6th | 7th | 8th | 9th |
| 606.033 | 3.741 | DeRosa | Fukudome | Lee | Soto | Ramirez | Soriano | Pie | pitcher | Theriot |
| 605.697 | 3.739 | Fukudome | Lee | Ramirez | Soto | Soriano | Pie | pitcher | Theriot | DeRosa |
| 605.402 | 3.737 | Fukudome | Lee | Soto | Ramirez | Soriano | Pie | pitcher | DeRosa | Theriot |
| 605.378 | 3.737 | DeRosa | Lee | Fukudome | Ramirez | Soto | Soriano | Pie | pitcher | Theriot |
| 605.314 | 3.737 | Fukudome | DeRosa | Lee | Soriano | Ramirez | Soto | Pie | pitcher | Theriot |
| 605.294 | 3.736 | Fukudome | Lee | Ramirez | Soriano | Pie | Soto | pitcher | Theriot | DeRosa |
| 605.287 | 3.736 | DeRosa | Soto | Fukudome | Lee | Ramirez | Soriano | Pie | pitcher | Theriot |
| 605.286 | 3.736 | Fukudome | Soto | Lee | Ramirez | Soriano | Pie | pitcher | Theriot | DeRosa |
| 605.268 | 3.736 | Fukudome | Lee | Soto | Pie | Ramirez | Soriano | pitcher | Theriot | DeRosa |
| 605.223 | 3.736 | Fukudome | Lee | Ramirez | Soriano | Soto | Pie | pitcher | Theriot | DeRosa |
| Runs/season | Runs/9 | 1st | 2nd | 3rd | 4th | 5th | 6th | 7th | 8th | 9th |
| 565.289 | 3.489 | DeRosa | Fukudome | Theriot | pitcher | Pie | Soto | Soriano | Ramirez | Lee |
| 565.301 | 3.49 | Pie | DeRosa | Fukudome | Theriot | pitcher | Soto | Soriano | Ramirez | Lee |
| 565.562 | 3.491 | DeRosa | Theriot | Fukudome | pitcher | Pie | Soto | Soriano | Ramirez | Lee |
| 565.64 | 3.492 | Pie | DeRosa | Fukudome | Theriot | pitcher | Ramirez | Soto | Soriano | Lee |
| 565.697 | 3.492 | Fukudome | Theriot | DeRosa | pitcher | Pie | Soriano | Ramirez | Soto | Lee |
| 565.715 | 3.492 | DeRosa | Theriot | Pie | pitcher | Soto | Soriano | Ramirez | Lee | Fukudome |
| 565.758 | 3.492 | DeRosa | Theriot | Fukudome | pitcher | Pie | Soriano | Ramirez | Soto | Lee |
| 565.808 | 3.493 | Fukudome | DeRosa | Theriot | pitcher | Pie | Soriano | Soto | Ramirez | Lee |
| 565.833 | 3.493 | Theriot | Pie | Fukudome | DeRosa | pitcher | Soriano | Soto | Lee | Ramirez |
| 565.878 | 3.493 | DeRosa | Theriot | Pie | pitcher | Soriano | Ramirez | Soto | Lee | Fukudome |
There is some evidence for putting your best OBP player (Fukudome) in the leadoff spot, followed by your best OPS guys with an emphasis on OBP. Unfortunately, the 2008 Cubs ZiPS projections probably aren't the best lineup for determining generalities about each lineup slot; the top 4 hitters are all within 60 points of each other in projected OPS, and the top 6 projected starters all have projected OBP within 50 points of each other as well. However, there is good support for LaRussa's "second leadoff hitter" in the 9-hole, and maybe even a "third leadoff hitter" with the pitcher in the 7th or 8th slot depending on whether your real leadoff hitter also has good pop or not. The optimum lineup here is only about 10 runs (maybe 1 win) better than the lineup Lou fielded on March 18th.
This is a FanPost and does not necessarily reflect the views of SB Nation or Al Yellon, managing editor (unless it's a FanPost posted by Al). FanPost opinions are valued expressions of opinion by passionate and knowledgeable baseball fans.
1 recs |
24 comments
Comments
Interesting...
that Soto is in the top 4 spots in 60% of the best lineups and in the last 4 in 90% of the worst lineups. Did you forward this to LPiniella@Cubs.com?
As I've told you before, I never repeat myself.
by santoswoodenlegs on Mar 20, 2008 12:24 PM CDT reply actions 0 recs
Interesting...
that Soto is in batting 2-4 in 60% of the top 10 lineups and 6-8 in 90% of the bottom 10. Did you forward this to l_piniella@cubs.com?
As I've told you before, I never repeat myself.
by santoswoodenlegs on Mar 20, 2008 12:28 PM CDT reply actions 0 recs
OK, explain why you changed the e-mail address protocol...
...the second time.
Ladies and gentlemen, your 2008 Chicago Cubs starting outfield: Soriano-Pie-Fukudome. Let it be.
by dat cubfan daver on Mar 20, 2008 12:35 PM CDT up reply actions 0 recs
Interesting...
that I just repeated myself...weird hiccup I guess. I tried to post the first comment and accidentally clicked another link that took me away from the diary. When I came back I didn't see my post so I reposted my original thoughts....
As I've told you before, I never repeat myself.
by santoswoodenlegs on Mar 20, 2008 12:36 PM CDT up reply actions 0 recs
Somehow, I can't see Lou perusing email.
But maybe that's just me.
"That's my opinion and if you don't like it, well, I have others." ~ Groucho Marx
by Al Yellon on Mar 20, 2008 12:44 PM CDT up reply actions 0 recs
re: Somehow, I can't see Lou perusing email.
I could see Lou having that machine that prints out your e-mail as it comes in. I keep hearing radio ads for this thing.
Ladies and gentlemen, your 2008 Chicago Cubs starting outfield: Soriano-Pie-Fukudome. Let it be.
by dat cubfan daver on Mar 20, 2008 12:49 PM CDT up reply actions 0 recs
Radio?
What's a radio? Oh yeah....I've seen some banner ads on the internet about those "wireless RF audio recievers"...but I think their outdated and usless.

As I've told you before, I never repeat myself.
by santoswoodenlegs on Mar 20, 2008 1:03 PM CDT up reply actions 0 recs
Oh, and getting to the topic at hand...
...I agree that Soto's various spots in the lineup are interesting. In fact, I'm not going to go as far as saying that this team's success depends on Soto's at coming through as projected, but I think he could make the difference between this team being great and it just being good.
I, too, notice that none of the top lineups have the pitcher in the nine spot. I know it's not popular to support what's perhaps fast becoming known as "the LaRussa model," but this concept does intrigue me.
I heard a snippet from an interview with Bill James on the Score last week , and he made the pretty valid point that batting the pitcher and catcher last was really just an arbitrary decision made decades (upon decades) ago, and that the decision wasn't based on any research whatsoever. He predicted that, within a few years, moving the pitcher out of the nine-hole will become fairly commonplace. I'm not necessarily criticizing Lou for not making this move now but, like I said, it's an intriguing concept.
Ladies and gentlemen, your 2008 Chicago Cubs starting outfield: Soriano-Pie-Fukudome. Let it be.
by dat cubfan daver on Mar 20, 2008 12:47 PM CDT up reply actions 0 recs
regarding the "LaRussa model"
Statistically speaking, it makes a ton of sense. In general, each spot in the order sees about 2-4% fewer plate appearances than the spot before it. If your worst regular is more than 4% better than your pitcher (quite probable), the strategic advantage of putting him in 9th may outweigh the reduction caused by the decrease in PAs.
by false cognate on Mar 20, 2008 1:25 PM CDT up reply actions 0 recs
especially with a guy like soriano batting 1st, who would benefit greatly from having runners on. that said, since he seems to slump badly the first month, it may not matter.
by dr stabbingworth on Apr 9, 2008 6:20 PM CDT up reply actions 0 recs
also worth noting
is that the differential between the worst and best lineups is about 40 runs, or 4 wins. It does lend some credence to the idea that lineup really doesn't matter that much, especially considering that the differential between the "standard" lineup and the "optimal" lineup is only 10 runs.
by false cognate on Mar 20, 2008 1:32 PM CDT reply actions 0 recs
Really?
It takes 10 runs on average to get a win? I must be missing something.
"Confidence is what you have before you understand the problem." Woody Allen
by BlueSox on Mar 20, 2008 1:48 PM CDT up reply actions 0 recs
10 additional runs
over the course of the season is generally equated to 1 extra win. Not that it takes 10 runs to win one game, just that over the course of 162 games, if you score an extra 10 runs, you're probably one game better than you would have been had you not scored those extra runs.
If that still doesn't make sense, let me know and I'll see if I can explain it better.
by false cognate on Mar 20, 2008 1:57 PM CDT up reply actions 0 recs
Ah,
I think I understand. Here's another question:
Is there any way to figure out how much of the error in projections like this can be attributed to the stats that are left out (i.e., errors, steals, etc.)?
"Confidence is what you have before you understand the problem." Woody Allen
by BlueSox on Mar 20, 2008 2:13 PM CDT up reply actions 0 recs
well
It's not too hard to take a crack at some of them. The 2007 Cubs reached on error 82 times. A walk or a single is valued between 0.4 and 0.6 runs, according to Tom Tango, so we might expect an ROE to be about the same. So you could maybe stack on another 35 - 40 runs as a result of errors against, except that I don't think it's in the batter's control whether an error is committed or not. This could be useful in validating the simulator against historical data, but wouldn't be so useful in predicting future performance.
Speed is a lot harder to model; my simulator doesn't take it into account. If a single is hit, the runners on base advance one base. Obviously, in some instances in real life, fast runners (Pie, perhaps) would advance to third. This is a non-trivial problem to solve, so I ignored it.
by false cognate on Mar 20, 2008 2:43 PM CDT up reply actions 0 recs
All of those run totals look really low.
I haven't had time to look at anything other than the tables here, but using the Baseball Musings lineup simulator, the Cubs score between 4.5-5.5 RPG with about any lineup configuration I can come up with.
by cwyers on Mar 20, 2008 3:05 PM CDT up reply actions 0 recs
They are low.
As I said above, my simulator doesn't take into account a lot of possibilities - errors, steals, sacrifices, speed, etc., and the baserunning is strictly station-to-station. But it should give a first order approximation. Errors and sac flies alone will probably be about 80 runs, or 0.5 runs per game.
Using the Baseball Musings lineup tool and the ZiPS projections, I max out at 5.1 runs per game, which leaves me with about another 0.8 runs per game to account for. I don't know if that can be done with just speed. One problem I have with the Baseball Musings lineup tool is that it assumes that the value of OBP and SLG for each position is independent of who is in the other positions. As you are doing the permutations, the value of an point of OBP and SLG for each slot clearly must depend on who is in the immediately following slots.
by false cognate on Mar 20, 2008 4:03 PM CDT up reply actions 0 recs
I don't think that speed or the other factors...
...are what's going on here. You seem to be presuming that the advancement value of an event is equal to the number of bases taken by the batter, which isn't true - a double will score a runner from first quite often, fore example. So your model needs to reflect that fractional advancement.
by cwyers on Mar 20, 2008 4:42 PM CDT up reply actions 0 recs
I understand that
That's what I mean when I say my algorithm is strictly station-to-station, because it's the easiest thing to model. Making runner advancement variable is a tricky thing to simulate because it depends on so many variables - runner speed, where the ball was hit to and fielded, defender arm strength and accuracy. Sal Baxamusa has written a similar program, and he allows runners to take one extra base if there are two outs. In preliminary testing, that helps, but it definitely doesn't make up the whole difference.
by false cognate on Mar 20, 2008 4:51 PM CDT up reply actions 0 recs
Your current assumption makes everyone...
...a below-average baserunner. Simply making everyone an average baserunner will drastically improve your model.
Here's how I would suggest modeling it; presume that the average baserunner advances 2.67 bases on a double. (I think that's pretty close to accurate; I can get more detailed advancement rates when I get home tonight if you're interested.) So then use a RAND function to get a random number between 0 and 1 - if it's above .67, the runner stays at third, if it's below .67, the runner goes home.
It's a bit more complex, but I don't know that it's unworkable. (I'm the furthest thing from a computer programmer out there, so take it with a grain of salt.)
by cwyers on Mar 20, 2008 5:38 PM CDT up reply actions 0 recs
Sure
Post the data if you've got it. What you suggest isn't unworkable, but it'll definitely have to wait until after I parallelize the code because it would add a significant chunk to the run time. Right now to do one run of all 9! lineups it takes about 50 hours, running at 100% on one core of my Core2Duo based computer without any compile optimization. The problem is, as they say, "embarrassingly parallel," so I should be able to cut that in half if I can get it working on both cores.
by false cognate on Mar 20, 2008 6:38 PM CDT up reply actions 0 recs
I find this
very interesting. You put much work into this, for this I applaud you. The problem I have with any projections is this. How does anyone know what a Soto, Pie and Dome will actually do?
We are looking at three unproven players at the ML level. I understand what data is used for determining their stats for the year, but I think the basis is flawed.
Certainly as a Cubs fan I hope for the very best from these three. With Lee, Soriano and other proven ML, the data is solid with the history of their level of play. With these three ( Pie, Dome & Soto), which is over 33% of our fielding starters, I find it troubling thinking that all three will come to their projections supplied by any means.
Either way, I am impressed with your effort. Nicely done.
"You can't take life to seriously, you don't get out of it alive"
by wild bill on Mar 20, 2008 6:27 PM CDT reply actions 0 recs
Doesn't like Soriano..
..very much.
I'm drunk......and it shows.....
by Keystone80435 on Mar 20, 2008 10:50 PM CDT reply actions 0 recs
but i guess since it doesnt factor speed and sb it wouldnt like him.
pretty cool though.
I'm drunk......and it shows.....
by Keystone80435 on Mar 20, 2008 10:54 PM CDT up reply actions 0 recs

by 
























