Ramblings on Batting Average
I’ve got a lot on my plate right now, and I’m hoping that this will change soon. I’ve got some ideas for future articles/research/what have yous, and I wish that I had the time to sit down and take care of everything at once. So here I am with a few minutes to spare, and I’m going to ramble about batting average. I know that’s something of an odd thing to want to write about- Lord knows there’s a ton of other topics that are far more interesting to discuss- but the amount of times that I’ve seen it referred to as a measure of player value is somewhat alarming (i.e., “Dan Uggla is a bad hitter because he hits .250 with a lot of strikeouts”). I can’t blame someone who barely follows baseball to refer to batting average, since they don’t spend enough time thinking about or paying attention to the game to really give a damn. But for the individuals that post on forums and write articles (MLB.com writers- I’m wagging my finger at you), it’s really time to begin using your noggins a bit and really think hard about the way baseball works. I should say that it really isn’t my place to tell someone that they’re right or wrong, but it’s obvious that there are more reliable measurements. And if you’re having a debate about which player is better, you’re going to want the most accurate measurements available.
What does batting average tell you about a hitter? It tells you the amount of times that the player got a hit in x amount of at-bats. It’s a very simple rate statistic that implies two things about a hitter that make it so darn attractive- 1) it implies that the hitter is either good or bad at getting on base, and 2) it implies that the hitter is good or bad at moving runners over or driving them in. Okay. So perhaps it can give us something of a rough estimate of a player’s provided value with the bat. But those two implied benefits of batting average are woefully incomplete. For one, getting a hit is not the only way of reaching base. Players also reach base via the walk or being hit by a pitch, and so on. These events cannot be discounted. A walk is worth slightly less than a single, but it still provides ample value. As for the other benefit- implied run driving production- batting average is empty in that regard. You can be 100 for 300 with 90 singles and 10 doubles and be considered as good of a run producer as a player that is 100 for 300 with 60 singles, 20 doubles, 5 triples and 10 big flies. It’s obvious that the second player provided more run driving value, but both players are .300 hitters. By the way, there are two statistics that measure the things batting average implies:
On-Base Percentage: (H + BB + HBP) / (AB + BB + HBP + SF)
Slugging Percentage: (H + 2B + 2*3B + 3*HR) / AB
Now, I want you to pay close attention to the bolded letters in each equation. See the “H” in the numerator and the “AB” in the denominator? That’s right; that’s our old friend batting average. But he’s not lonely any more- in fact, he’s brought some friends along with him to help paint a more complete picture of a hitter’s performance. On-Base Percentage (OBP) measures…you’ve guessed it, the percentage of times a player reaches base in his plate appearances, and slugging percentage (SLG) measures the player’s run driving ability*. If you think about it, the more data you’re including, the more accurate the statistic should be (if constructed correctly, of course!). And if you combine OBP with SLG, you just might be able to get a pretty good grasp of a hitter’s value at the plate. This is what we call OPS (On-Base Plus Slugging). At least, we would hope that it’s more accurate than batting average. So just how good (or bad) is batting average when it comes to estimating runs scored? Let’s take a look.
First, the equation we’ll be using to convert rates to runs:
Runs = (2 * Rate / LgRate – 1) * Innings Batted * (LgRuns / Lg Innings Batted)
This is from The Hidden Game of Baseball (which I picked up a few weeks ago- definitely a fun read and I highly recommend it), and I honestly wouldn’t have noticed it if it weren’t for Patriot. Innings Batted are outs made divided by three, since there are three outs to an inning. I’m not sure how Palmer defined “outs,” but I do it as (1-OBP)*PA. And I measure accuracy through Root Mean Square Error (RMSE). I’m doing it this way because it gives us the average difference between the estimated value and the actual value. The lower the RMSE, the better.
Batting Average: 46.15
On-Base Percentage: 37.94
Slugging Percentage: 38.14
On-Base Plus Slugging: 25.60
(Data is based off of 2000-2009 data only. This gives us a good sample size to work with- preferably, I’d go back even further to get maximal accuracy, but I just wanted to illustrate my point.)
Batting average is usually within 46 runs of the actual runs scored, while OBP and SLG are around 37-38 runs. OPS takes the lead at 25.60. So let this be a lesson to you batting average lovers- batting average does not relate very well to run scoring at all. In fact, it’s downright terrible compared to other simple methods such as OBP, SLG and OPS. It tells us very little about the value of a hitter, aside from whether or not they’re good at making contact. And really, this is all that batting average should ever truly be used to measure- a player’s contact skills. I should note, however, that OPS, while much better at estimating the run scoring process that any of the individual “slash” rates, is still extremely flawed. This is because we’re adding two rates with different denominators: OBP is a per-plate appearance rate, while SLG is a per-at-bat rate. It is this reason that OPS should be used as an offhand method of player evaluation only. If you’re looking for a rate statistic that is strong at predicting runs, there are two that stick out:
If you want to use a rate statistic to identify player talent, either EqA or wOBA should be the way to go. EqA, however, uses some awkward weights and isn’t as good at evaluating individual hitters as wOBA is, despite its stronger RMSE in this limited “study.” If you’re new to EqA, it’s adjusted so that the league average is exactly .260 in every year. wOBA is on the same scale as On-Base Percentage (hence the name Weighted On-Base Average).
*SLG doesn’t necessarily measure actual power production, but it attempts to model it. Since AVG is part of the equation, a lower AVG will result in a lower SLG. That being said, SLG – AVG (which we call ISO) is a better way of measuring a player’s “power.” Additionally, the intrinsic weighting of SLG- a double equals two singles, a triple equals three singles, etc.- is not proportional to the actual run scoring process. So while SLG is a useful tool, and it’s a better measurement of a player’s value than average is, it’s still not particularly accurate.
**This version of wOBA is different than the traditional wOBA in that we’re not using specific linear weights tailored to fit the run environment. Rather, this rendition of wOBA is derived from nothing more than OBP and SLG. The equation is -(.53*(.56*OBP+.31*SLG)^2 + 1.35*(.56*OBP+.31*SLG) – .045). Looks like Kincaid’s “quick and dirty” wOBA predicts run scoring pretty well.