Musings on Defensive Metrics
I love defensive metrics.
I don’t exactly know why they appeal to me so much—I think it is in part due to the fact that they’re so darn controversial and more often than not, they’re exceptionally complex. Fielding is infinitely more difficult to quantify than hitting, partly because we’re not dealing with neat little buckets—we know how many doubles a player has in a season; we know how many walks, home runs, steals, and so on and so forth—and it is relatively easy to determine how much value a player has provided because we know precisely how many opportunities the player has.
We have plate appearances for hitters, but we don’t have a precise number of chances for fielders. We have, of course, “total chances”—this is the sum of assists, putouts and errors for any given player. This is distinctly different from plate appearances, of course, because there are balls hit to the fielder that are not recorded as part of their chances. So we’re only given a partial view of a defender’s actual chances. We know that, based on assist rates and conventional wisdom, players up the middle (i.e. shortstops, second basemen and center fielders) see more balls in play than, say, first basemen and left fielders. But the precise number of balls in play, and how difficult the play is, is exceptionally difficult to accurately estimate.
There are a multitude of fielding measures out there, with the earliest being fielding percentage and range factor. Fielding percentage is inherently flawed, as the distinction between what is a hit and what is an error is based on the discretion of the official scorer and is entirely arbitrary. Furthermore, it tells us nothing more than the player’s “sure-handedness” rather than the amount of ground he covers in the field. Range factor is a Bill James invention—it is a marginal upgrade in that it looks at the player’s rate of putouts and assists per innings played, but is rather limited in that it doesn’t account for the player’s actual ball in play distribution. Some renovations have been made to the metric (which James refers to as “Relative Range Factor”), but it still remains an extremely crude estimate of a player’s defensive efficiency.
Other systems go to more in-depth lengths to estimate a player’s chances—Charles Saeger’s Context-Adjusted Defense estimates where the ball likely went based on the distribution of right-handed and left-handed balls in play. Sean Smith’s Total Zone looks at the opposing hitter’s distribution of balls in play that were fielded to estimate where the ball in play likely went. Michael Humpreys has his own system, Defensive Regression Analysis, which I know little about. He is publishing a book titled Wizardry: Baseball’s All-Time Greatest Fielders, which will hopefully outline the calculation of the metric—it should be interesting to see how he estimates chances.
No matter how much work goes in to it, these systems are still essentially using crude estimations of where we would expect a ball to go, rather than knowing exactly where it went. While this gives us a decent ballpark figure, we need something more precise for determining responsibility.
In the 1980’s, John Dewan, then-president of STATS, Inc., used “stringers” to take note of where a ball in play was hit. This immediately provided analysts with an infinitely more accurate means of estimating chances, because we are no longer estimating where we think balls are hit, but because we now have the general location of where the ball went. Dewan named this measurement Zone Rating (ZR), which he describes as:
Figure out an area around each fielder that we call his zone. This is an area where he should be expected to make plays. Specifically, it’s any area where fielders as a whole successfully handle at least 50% of the batted balls in that area. Count all the batted balls that are hit into a player’s zone. Count all the balls that he successfully fields. Divide the number that he successfully fields by the number in his zone and that’s his zone rating.
And that’s really all there is to it. Think of it as an on-base percentage of sorts, but from the fielder’s perspective. It is essentially defined as:
ZR = Plays / Chances
And once these zones are established, the field is then broken down further into smaller sub-zones. The original STATS Zone Rating consists of about 26 zones, but Dewan’s current company, Baseball Info Solutions (BIS) contains 262 vectors. Only ground balls are counted as chances for infielders, while both line drives and fly balls are counted for outfielders. The original STATS ZR counted both balls hit in the player’s zone and plays made out of zone as total chances and gave double credit for turning a double play. There are some issues with that approach, as the additional credit for the double play is unnecessary and counting out of zone plays as a regular chance underrates players with substantial range.
The newer ZR (Revised Zone Rating, or RZR) separates in-zone plays and out of zone plays. There are essentially three categories in BIS ZR: balls in zone (BIZ), plays made in the zone (PM), and plays made outside of the player’s zone (OOZ). RZR, then, is simply PM / BIZ. The newer ZR tells us two things—how efficient the player was at handling balls hit into his zone, and how often the player made plays outside of his zone. RZR can be converted into plays saved/cost relative to the average fielder through this formula:
Plus/Minus Plays = (PM – (BIZ * LgRZR)) + (OOZ – ((LgOOZ / LgBIZ) * BIZ))
This looks more complicated than it really is; we’re merely comparing the player’s actual plays made to the expected plays made by an average player in the same number of chances. This figure can then be converted into runs by multiplying the plays above/below average figure by about .75 for infielders and .85 for outfielders. This gives us a pretty simple and solid framework for estimating defensive value, but certain refinements can be made (i.e. more precise locations rather than one large zone, and the speed of the batted ball among other things).
To help refine our estimation of a player’s value provided on the field, Mitchel Lichtman developed what is known as Ultimate Zone Rating (UZR). UZR takes the basic ZR framework and makes a number of adjustments. It uses the smaller sub-zones, which have their own out conversion rates based on different information: the handedness of the batter, the estimated speed of the batter (which consists of two categories: fast and slow), how hard the ball was hit (soft, medium, hard), the groundball-flyball ratio of the pitching staff (with the idea that a GB-heavy staff will induce easier grounders than a FB-oriented staff), along with the base-out state (SS/2B shading up the middle in a double play situation, a first baseman holding the runner on, the third baseman anticipating a bunt, and so on). It also calculates the efficiency of infielders at turning the double play and handling bunts, and an outfielder’s throwing arm. It is also park-adjusted to give a context-neutral rating of a player’s theoretical runs saved/cost.
The Defensive Runs Saved (DRS) system is John Dewan’s metric and is pretty similar to UZR. It does not make adjustments for the handedness of the batter, the pitching staff or the speed of the runner, and it uses smaller buckets, with the idea in mind that it will increase accuracy. It is also more aggressive with awarding credit/penalty, as explained by Ben Jedlovec:
* Plus/Minus is a little more aggressive in awarding credit/penalty. An example: 100 balls in a ‘bucket’ (specified type, velocity, location), 30 fielded by the 2B, 20 by the 1B, 50 go through for singles. On a groundout to the second baseman, we give +50/(50+30) = 5/8 = +.625. UZR gives +50/100 = +.50. On a single through both fielders, Plus/Minus gives -30/80 = -.375 to the 2B, and -20/70 = -.29 to the 1B. UZR gives -30/100 = -.3 to the 2B, and -20/100 = -.2 to the 1B. You could make an argument for either method of accounting, but neither one is better than the other. The differences are the greatest at the middle infield positions, where overlap between fielders is the highest.
DRS also accounts for double play efficiency and outfield arms, and it also makes an adjustment for home run saving catches made by outfielders. The system is a bit odd—despite the fact that it is designed to be “runs saved/cost relative to the average fielder,” the figures don’t sum to zero as it should. Not only that, but the home run saving catches don’t implement a baseline (i.e., how many extra home runs the player saved above the average), leaving the system with an unworkable baseline. This renders it damaged goods in terms of total value metrics such as Wins Above Replacement (WAR), in my mind. If we don’t know what the actual baseline is, how can we use it in a metric that demands a baseline of exactly average out of its components?
The UZR engine in my mind is perfect. It does everything that I would want a defensive metric to do—but the problem I have with it doesn’t reside in its calculation; rather, my issue with it stems from the source of the data.
Colin Wyers of Baseball Prospectus began creating a defensive metric a while back that incorporated batted ball data (he has since dropped the project). He pointed out the issue with zone-based systems that I believe can be summarized in one simple image:
Wyers explains the issue, which I’ll break down in to two parts:
Suppose we’re interested in the ball indicated by the blue dot in that diagram. In a zone-based system, that particular fielding play would end up being compared to the play indicated by the red dot on the far left, but not the play represented by the red dot right next to it on the field.
This is what I’m talking about when I say that there’s an issue with the data source. We can have two balls in play that are practically right next to one another yet have different out conversion rates, which could artificially inflate or deflate the player’s rating.
Of course we can always divide into smaller and smaller zones to address this issue. But you end up slicing your sample smaller and smaller, making yourself more susceptible to random variation. And you’re always going to end up with an arbitrary distinction between what batted balls are peers and what aren’t.
This is the crux of the issue. The more we slice the zones up into smaller and smaller sub-zones, the conversion rates become more and more arbitrary. Remember, we’re not using GPS-guided tracking devices that give us the exact velocity, trajectory and precise location of the batted ball—we’re using data that’s been tainted by the human eye. This becomes even more of an issue when we take into consideration the fact that these stringers have different sightlines in each stadium (due to the design of the park). This leaves us with issues from parallax, in which the perceived vector of the batted ball is largely dependent on the stringer’s seated location (for further reading on the biases in batted ball data, I highly recommend Wyers’ article on the subject).
So where exactly does this leave us?
It’s hard to say. The creator of UZR, Mitchel Lichtman, has been quite vocal about the issues of sample sizes and biases in the data. Yet UZR is still being used to evaluate single season performances by mainstream sites like FanGraphs, and even on this site (which will undoubtedly stop, at least on my end). Lichtman suggests regressing the single-season data about 50%, and that seems to be a step in the right direction. That doesn’t solve the problems with UZR, of course, and he’ll be the first to tell you that. But it’s better than using the straight UZR figures.
I believe that, due to the aforementioned issues with the sub-zones, it would actually be better to switch back to the larger zones used in Revised Zone Rating. UZR is, in my mind, a defensive metric that’s actually ahead of its time. The framework is flawless but the execution of the data inputs is flawed. And until this issue is resolved, I’ll be calculating my own fielding runs from the simple RZR/OOZ figures displayed on FanGraphs. But it won’t be calculated the same way I outlined above.
My calculation of Zone Rating is as follows:
ZR = (PM + OOZ) / (BIZ + OOZ)
The figures in the numerator are total plays made, while the denominator consists of both balls hit in the player’s zone, in addition to his out of zone plays made as his total chances. This takes us back to the original STATS ZR equation: Total Plays / Total Chances. Why am I including out of zone plays in both the numerator and the denominator? Like mentioned earlier, won’t it underrate the players that are extremely rangy?
The answer to the second question is an emphatic yes. Yes, this will underrate players that make a lot of plays outside of their zone of responsibility. But I’m not concerned with this. Because of the uncertainty that comes with the differentiation between plays that are in zone and out of zone, my preference is to lump the two together. It’s a conservative approach, yes, but I’d rather be conservative than falsely crediting players for out of zone plays that truly weren’t all that difficult. Ever since Sean Smith noted his observations of Yunel Escobar’s “superior” OOZ talents, I’ve been extremely skeptical of giving extra credit on OOZ plays:
Yunel Escobar, the top rated shortstop last season, made 48 plays out of his zone (only Miguel Tejada had more). Through MLB.com’s game archives, I watched almost all of these plays. On 13 plays, he was on the second base side because the defense used the shift against a lefty pull hitter. These were ordinary plays, not evidence of great range preventing a hit that the 2B should have had. He probably had more chances on the shift than most shortstops, playing in the same division as Ryan Howard and Adam Dunn. These two players hit 11 of the 13 shift balls. There are a few cases where the hit location code is clearly wrong, such as when the coding indicates the ball was in the 3B zone, but the shortstop actually moved slightly to his left to field it, or when the coding says it’s on the 2B side (having 2B as a marker makes it much easier to judge where the zone boundaries are), but the shortstop clearly fields it on his side. There were 6 miscoded plays, 21 more where it appears the ball was in the shortstop’s zone (though not certain), including some that were routine grounders. There were 2 plays where I couldn’t load the game or find the inning in question. Only 6 plays, in my judgement, were outstanding plays where Yunel ranged into another fielder’s zone.
I’m also uncomfortable estimating the player’s OOZ chances, as using OOZ / BIZ is a crude guess of the player’s opportunities on balls out of play. We have the general location of the ball in play; it’s best not to become overzealous until we have more refined data.
The conversion of ZR into plays above or below the league average is as follows:
Plus/Minus Plays = (PM + OOZ) – (LgZR * (BIZ + OOZ))
Each position has a specific run value per play saved/cost. For the time being, I’ll be sticking with the run values presented by Chris Dial a few years ago:
For the sake of completeness, I will still be incorporating UZR’s double play runs for infielders and outfield arms for outfielders. And that’s all there is to it. The metric is extremely simple and will yield pretty similar results to UZR. It’s an imperfect model, absolutely, but I prefer to limit the damage of the data source.
Defensive metrics have come a long way since the days of fielding percentage, range factor, and even Pete Palmer’s Fielding Runs. The Saegers, Smiths and Humphreys of the world have done remarkable things to estimate defensive ability, and Dewan really took a step forward with his Zone Ratings. But we shouldn’t get ahead of ourselves. With the development of Field F/X, we’ll hopefully finally have a system that can give us a near-definitive measurement of a player’s defensive value—until then, we’ll have to be careful and acknowledge the shortcomings of these metrics and understand that when all is said and done, they are at best educated guesses.