Hey, everyone. It's your always-absent "statistical editor." I'm going to try to be around more often, and thought this would be a good reason to jump back into the BCB-seas. I'm re-starting things with this Fan Post on the Cubs recent signing of Gerardo Concepcion, who was the rookie of the year in the Cuban National Series. Nice, right?
For those that don't know/remember me (I don't blame you), I come at things from a statistical angle. So the first place I tend to look are the numbers. And here are Concepcion's (h/t to yankeeanalysts for the translations):
Yuck. Read on to find out why these numbers seem terrible to me, and to find out why I still like this signing, despite the numerical warning signs.
First, a caveat. The issue with evaluating players from Cuba - in particular for basement-dwelling bloggers such as myself - is that the projecting players MLB performance based on their Cuban stats is difficult. Why? Well, the sample size is small. In general, when you're evaluating something in science you want as many data points as possible, and the more data you have the better your ability to accurately use that data to build predictive models. For example, we have much more certainty on how Matt Garza will perform this year than we do on how the recently-departed Andrew Cashner will perform this year. Why is that? It's because we've seen a lot more of Matt Garza at the MLB level. In other words, we have more data.
Now, apply this principle to prospects coming from various backgrounds (high school, college, minors, other countries). We are more certain of projections of future performance for minor league players than anything else, because just about every single player in MLB serves as a data point with which we can base our expectations for the current minor league players. In other words, we have lots of data to back up projections of minor league players. But we have fewer data on college players, and even less so on players coming from foreign professional leagues. We have some of the worst data on players from Cuba. What this means is the projections based on statistics have a particularly high degree of uncertainty.
Faced with this uncertainty, what should one do? I'd recommend three things. First is to think about the context of the projection, and dive deeper into the stats to see if you can unearth an explanation of the "why" behind a player's good/bad numbers. Then, use your brain and see if the numbers have an explanation. Finally, ask the scouts what they think.
Let's start with the numbers. Concepcion's ERA is nice, and the W-L stellar. But let's dive deeper, as I suggest above. Moving from ERA to FIP** goes in the correct direction here, as it can help figure out if a pitcher did well because of "luck." And Concepcion's FIP was very high, suggesting his low ERA may have been luck-induced and that he shouldn't expect to post such a low ERA again. But we can do better than that. Why was Concepcion's FIP high?
Let's break down that high FIP in the context of the Cuban league. From what I can gather, the "story" on the talent level in Cuba is that it's roughly A-ish in talent/competition level, but it has a few studs playing in the primes of their careers. So if Concepcion had a high strikeout rate but his FIP were high because of a high HR rate, I'd give him a pass as I could chalk that high HR rate up to getting dinged by a few legit stars in their primes. But... that's not the issue here. The issue here is his strikeout rate is low. Aaaand that's a problem. If you're not striking guys out a lot, you're not dominating your opponents. And any player that's playing at a level of competition that is A-ish (overall) should dominate their competition if they expect to get to the big leagues in the next couple of years. In other words, the issue with Concepcion isn't that he's getting hit by the "Cuban league stars." The issue is that he's not striking out the bottom of lineups in the Cuban league. At least not often enough.
So we've taken a look at his stats, and used our brains to give them context. What about the scouts?
They At least an agent (thanks for the catch, DartmouthCubsFan) seems to love this guy:
"I would not throw him into the major leagues yet," agent Jaime Torres said. "While I think he can make it really fast, he is only 19. But this kid has good stuff, a great makeup and he's extremely intelligent."
OK, how about an actual scout? Kevin Goldstein was asked over Twitter where he'd rank Concepcion on the list of Cubs prospects, and he said 6th. Based on his November ranking of Cubs prospects, that would make Concepcion the Cubs' best pitching prospect. The Cubs don't exactly have the most stacked stable of minor league arms, but that's still high praise.
The short story is this: there isn't any good reason to expect anything out of Concepcion based on his numbers. But this is an extreme case where there are very good reasons to trust the analysis of scouts over numbers. (I bet there are people here that thought I'd never say those words). So on this one, I'm going to trust the scouts. And even if they're wrong (and my statistical intuition is right), the Cubs aren't spending a lot of future money on this contract. (Yes, I said "future money." I'll explain that later.) So I'm happy, even though the numbers are telling me to be apathetic (at best).
** - For those that aren't familiar with FIP - Fielding Independent Pitching - it's a number based solely on the things that don't usually involve fielders and that the pitcher has the most control over: strikeouts, walks, and home runs. It therefore limits the luck from having lots of "Texas League singles" or hard liners right to the second baseman. Because it's scaled to ERA, the difference between those two numbers is often attributed to luck: high FIP and low ERA is often taken as a sign for good luck, and not as good of a performance as the player's ERA would indicate, and vice versa.