First, I want to say that I should be and plan to be around here more often than I have been since my original post. (For those that weren't around for it, I'm the new "stats guy" here at BCB. The first post was an introduction to me. This one is more of an introduction to the way i think about things.) I had an insane couple of weeks of work and some epic struggles with a code I’m developing for my research projects, including an episode where my computer was essentially telling me that 11 > 202. My goal is to post articles once a week, on Mondays. I’ll try to do some catching up going forward, but can’t make any promises. (I’m working on some really cool stuff at work right now, and haven’t felt stressed despite putting in lots of hours. That’s a situation that leads to me not having much time to spend on baseball.) Anyway, onto "this week’s" topic… My intention here is to give you some insight into how scientists analyze things as varied as baseball, supernovae, and fossilized bug bites. My hope is that by giving everyone a brief introduction to how scientists ask and answer questions, it will provide a better understanding of the source of the confidence sabermetricians (and scientists) have in their projections.
The scientific method is often summarized this way: research a topic, form a hypothesis, predict something based on that hypothesis, and then test that prediction. If the test results do not conform to the hypothesis, accordingly discard or refine the hypothesis. Either way, devise new tests for the (perhaps new) hypothesis and continue to improve upon it with repeated tests and refinements of hypotheses. Things that cannot be stated in the form of a testable hypothesis are not considered science, and do not factor into the scientific method. Examples of things that do not fall under the purview of science are statements such as: "there is a God," or "the Cubs are cursed." However, one can test the veracity of statements (hypotheses) such as "carbon dioxide can absorb infrared radiation emitted by the Earth," and "there is a better correlation between a pitcher’s FIP (fielding independent pitching) one year and his ERA (earned run average) the following year than there is between his ERA in one year and his ERA in the following year." Such questions and statements fall under the purview of the scientific method, and they’re very useful things to consider because we can make – or in the case of fans, recommend – decisions based on extrapolations from these hypotheses.
However, the power of the method is not that it can help make decisions, but that it should always improve the foundations upon which those decisions are made. Adam Dunn, a player at the center of much debate here over the years, is a great example of this. A few years ago many sabermetric Cubs fans – including yours truly – wanted the Cubs to pursue Adam Dunn. We saw the eye-popping OBA (on-base average) and SLG (slugging average) numbers he posted year after year, and thought he would be a great fit for the Cubs’ lineup. Today, most statistically inclined Cubs fans no longer want Dunn in Chicago. Why? Our understanding of baseball has improved. It was once thought that defense contributes relatively little to the value of a player. Furthermore, the ability to determine how value defense had for individual players was poor, and thus relatively inaccessible to the scientific method. A tool was eventually developed (UZR, or ultimate zone rating) that allowed quantitative defensive analysis of players and opened the analysis of defense up to the scientific method. The tool invalidated the hypothesis that defense did not contribute much to a player's value, particularly for extreme cases such as Dunn’s. Now that we have a reliable method of measuring defense, and can ascribe value to players based on that method, we can include it in our evaluation of the player. In short: we're better at this evaluation thing than we were a few years ago, thanks to those that developed UZR and other defensive metrics.
The degree to which Dunn was overvalued also serves as an example of something else: scientists and sabermetricians are going to be wrong, but yet seem to be persistent in their claims that "they know best." This isn’t necessarily ego (but it admittedly can be). We have a high degree of confidence in recommendations and projections because they are the result of a mountain of past work, very little of which is our own. That’s not to say we expect these recommendations and projections to always be right. On the contrary, we expect them to be inaccurate and treasure our mistakes as opportunities to refine hypotheses and devise new tests for them. Additionally, the best science will present uncertainties alongside measurements and projections, thereby admitting the likelihood that they will be "wrong." Those uncertainties give us an idea of just how wrong we are likely to be, and that gives us some semblance of confidence. (In fact, we call the degree of uncertainty a "confidence level.") What's more, because those uncertainties are known (more or less), there's a limit to how wrong they are likely to be. That's not true when it comes to predictions, opinions, and judgements based on one person's subjective analysis of a situation.
Ultimately, our confidence and arrogance isn’t about being right about everything… but instead comes from the knowledge that when averaged over time, projections and recommendations that follow from application of the scientific method will be better than those based primarily on one person’s conjecture. That’s where I (and others, I think) are coming from, and why I hope you understand we can come off as "know it all's" from time to time. It’s not that we know it all or think we do… It's that the products and tools that arise from a community that applies the scientific method is something to trust over any one person’s opinion, including our own.