A lot of the talk the past two days has been on "small sample sizes" and Hoff-POWER's limited ABs in the major leagues. In May Hoff had a BA of .467, in June he had a BA of .350, and in Sept. he has a BA of .400.
I don't want to belabor any of the arguments on whether or not he should be on the playoff roster, but I just wanted to make sure that some people don't use the words "small sample size" without some understanding of what they mean.
First of all, without getting into the mathematics of probability density functions, let's just say that to know someone's "true" batting average, one would theoretically need an infinite number of at-bats. Practically speaking, statisticians use something called a "95% confidence interval" to estimate when one is 95% certain that something like the batting average lies between a given range.
A good formula for all of this is that if N is the number of ABs, and the player has an observed batting average of BA, then assuming that the estimates of batting average are normally distributed then we can be 95% certain that the "real" batting average (RBA) of that player lies in the interval:
BA - 1.96*sqrt(BA*(1-BA)/N) < RBA < BA + 1.96*sqrt(BA*(1-BA)/N)
What does this mean? Well, Hoff has a current BA of .387 after 62 ABs, which means that we can be 95% certain that if he continues playing at this level, his "true" or "real" BA lies between .265 and .508. If he had more ABs, this "range" would shrink further. For example, Theriot has a BA of .302 after 502 ABs, giving a true range between .264 and .340. Ward has a BA of .216 after 97 ABs giving a range between .134 and .298.
Now, I'm not claiming that BA is an important metric for measuring a good hitter, and I'm not arguing who is the better pinch-hitter, or hitter in special situations. I'm just arguing for discounting other people's arguments on "small sample sizes" since in some sense, a lot of baseball stats can be considered to be based on small sample sizes, especially when one factors in things like lefty/righty splits, etc.