Triples Alley may be coming to an end. Grant Brisbee of McCovey Chronicles has asked me to come on board as his “stats guy,” and I have happily accepted his offer. I feel guilty for being unable to write here on a consistent basis without anything in between to keep you guys engaged- and moving to a site where there will be constant content, regardless of whether or not I’ll be able to write multiple times a week, works better for my peace of mind and for my readers.
I say the site “may” be coming to an end because 1) I don’t think I could fully say goodbye (yes, I am a sentimental kind of guy), and 2) because I may still post here on occasion. Since McCovey Chronicles is a Giants-centered site, and THT Live is analysis-centered, this site may remain as an outlet for somewhere in-between. I don’t know- I haven’t made that decision just yet. There’s also 3) the possibility that Dylan may come back at some point in time, and I’d like to keep the site open for him should he find the time to write again.
I want to thank everyone that has read and supported our site. What first began as a simple outlet for Dylan and myself has turned into something larger than I ever expected.
If you’re wondering when my first post will be over at The Chronicles, it’ll likely be sometime in late March. This is my last week of undergraduate studies- in other words, it’s finals week for me. I’ll be heading out to Arizona to watch some Spring Baseball, and I should find some time to put together my first post there.
It’s so nice to have a night where I actually have some free time to write. I’ll be able to start writing with regularity in about a week and a half or so, right around the time I make my yearly trip to Arizona for a few days of good ol’ Spring Training. I don’t really have anything in particular to write about, and I’m not really in the mood to do player analysis, so I thought I’d write a little bit about a very important sabermetric principle that’s found its way into essentially every aspect of sabermetrics- linear weights. And the more I think about it, if you understand linear weights (hereafter referred to as “LW” or “LWTS”), you’ll understand a lot about sabermetrics.
A brief history on LW, and how they are applied
The history of LW begins well before a night custodian by the name of Bill James, when a man named Ferdinand Cole Lane built a weighted system for measuring the impact of hitting events. This was later picked up by George Lindsey, who recorded detailed play-by-play data on over 1,000 Major League games and produced what is referred to as a run expectancy matrix, which tells us the probability of scoring from a particular base-out state. In 2010, for example, a team was expected to score .49 runs from a man on first and no outs until the end of the inning, and 2.4 runs with the bases loaded and no outs. This merely quantifies what we know- teams are more likely to score more runs in situations like 123_0 (bases loaded, no out) than they are in situations like 001_2 (man on first, two outs). Lindsey then took the average increase in run expectancy from each event to find the average value of each event. And really, that’s all there is to it. Linear weights are merely the empirical average impact an event has towards the run-scoring process. Pete Palmer expanded upon Lindsey’s work in the 1984 classic Hidden Game of Baseball, which introduced the Linear Weights System. What separated Palmer from the rest of the pack is that he included negative events into the equation, so that players were held accountable for the outs he made while at the plate; not just the positive outcomes. Palmer’s original equation (sans outs on bases):
LWTS = .46*1B + .80*2B + 1.02*3B + 1.40*HR + .33*(BB + HBP) + .30*SB – .60*CS – .25*(AB – H)
Singles are worth about .46 runs, doubles about .8, a home run adds about 1.40 runs on average each time, and walks and hit batsmen create about .33 runs each time; slightly less than that of a single. Later analysis revealed that Palmer’s original SB/CS values were too high; if I remember right, he increased the figures arbitrarily in an attempt to account for basestealing in high leverage (or pressure) situations. The reason why linear weights works, compared to the traditional statistics, is explained beautifully by Palmer:
“What Linear Weights does is to take very offensive event and treat it in terms of its impact upon the team- an average team, so that a man does not benefit in his individual record for having the good fortune to bat cleanup with the Brewers or suffer for batting cleanup with the Mets. The relationship of individual performance to team play is stated poorly or not at all in conventional baseball statistics. In Linear Weights it is crystal clear: the linear progression, the sum, of the various offensive events, when weighted by their accurately predicted run values, will total the runs contributed by that batter or that team beyond the league average.” (67)
Players have absolutely no control over where they hit in the lineup. Think of it this way- Bengie Molina was the cleanup hitter for the Giants for a number of years. Had he been on another team, would he have hit in the same spot in the order? And would he have collected as many RBI? Remember, RBI opportunities are highly dependent on one’s slot in the lineup. The same goes for runs scored- yes, good baserunners will score more runs than bad ones. But the player’s teammates are the ones that have to put the ball in play first. So it is foolish to rate players based on team-dependent numbers. Batting average is useless towards player value as well- yes, it tells us the rate of hits by the player, but what about the impact of the hits and the walks? OPS sure is nice, but it doesn’t tell us the amount of runs the player helps generate. Linear weights provides us with a player’s runs above or below the league average based on the ratio of his positive run output to his outs created. If a player is +0 LWTS, this simply means he hit at exactly the league average rate. If a player is -10 LWTS, he provided 10 less runs than a league average player in the same amount of opportunities, and if he’s +10, he’s provided 10 more runs than a league average hitter. That’s all there is to it.
How I generate LW
This is where things get technical, so you may just want to skip ahead. There are various ways to generate LW values- there is the empirical method, as outlined above (and described in more detail here). This is the most “correct” method. But since not everyone is a programming genius (guilty), there are other methods. One is to use a Markov model to simulate the impact of each event. This takes a heck of a lot of calculations, so it might not be for you- but there is one very basic Markov calculator on the internet that will spit out marginal values for you. The more simple method, and the one that I use, is the “plus-one” method outlined by Brandon Heipp, which squeezes out the marginal events from a dynamic run estimator- in this case, Base Runs (BsR). Why BsR?
Because it’s a very simple run estimator, is extremely flexible for the run environment, and works with a true model of run scoring. The original dynamic run estimator, Bill James’ Runs Created, works as follows:
Runs = (A*B)/C
Where “A” are the times on base, “B” is the advancement factor, and “C” are the opportunities; plate appearances. The problem with RC is pretty simple- it doesn’t treat home runs correctly. The simple equation shown above works, yes, but it doesn’t model baseball as well as it could. RC seems to forget that a home run creates a run every single time- excluding it is taking out a major aspect of the game. And this is one of the reasons why BsR works so well. It is constructed as:
Runs = A * (B/(B+C)) + D
Where “A” and “B” are the same as RC, “C” are outs made, and “D” are home runs. In short, it is essentially:
Runs = Times on Base * Score Rate + Home Runs
And it just so happens that it spits out marginal run values that match up perfectly with the empirical run values. Anyways, let’s say we use the simplest BsR formula out there to extract run values for the 2010 MLB season:
A = H – HR + BB + HBP
B = .88*1B + 2.42*2B + 3.96*3B + 2.2*HR + .11*(BB + HBP) + .99*SB – .99*CS
C = AB – H + CS
D = HR
First, we can reconcile the coefficients in the “B” term so that it matches actual league runs scored. To find our required “B,” we simply use (R – D)*C/(A – R + D) to solve for it. Divide this by the estimated “B” and we get 0.88, which we multiply all of our coefficients by. This is for accuracy purposes only. Once we have our new coefficients, we extract each run value through this (pretty darn intense) formula:
LW = ((B+C)*(A*b + B*a) – (A*B)*(b+c))/((B+C)^2) + d
Where the capitalized terms are the sum of the factor (“B,” for example, would be the frequency of the event times the coefficient in the B term), and the lower case terms are the coefficient of the factor (i.e. .88 for singles, 2.42 for doubles, etc.). Doing so yields us the following equation:
LWRC = .47*1B + .75*2B + 1.04*3B + 1.40*HR + .33*(BB + HBP) + .18*SB – .28*CS – .09* (AB – H)
You’ll notice that the title and the out terms look a bit different. The title stands for “Linear Weights Runs Created,” and the out term is -.09 because it is expressed in absolute terms. In order to make it relative to average, we find the overall runs per out- or runs scored divided by C- and add this figure to the events in C. For 2010, runs per out (excluding pitcher hitting) is .178. That gives us this:
LW = .47*1B + .75*2B + 1.04*3B + 1.40*HR + .33*(BB + HBP) + .18*SB – .45*CS – .27* (AB – H)
And that’s all there is to it. I know it may seem like a lot, but it really isn’t- especially if you have a spreadsheet set up for it. Heipp has one in the aforementioned link, and the wOBA calculator that I published a while back does all of this for you. More terms can be added to spice things up a little bit- for example, the LW formula from Tango’s coefficients give us this slightly more complicated equation:
Tango LW = .48*1B + .77*2B + 1.06*3B + 1.41*HR + .49*ROE + .31*NIBB + .34*HBP – .28*(AB – H – ROE – K + SF) – .29*K
And another equation developed from Retrosheet data that spans from 1911 until 2009 gives us the following formula:
Retro LW = .47*1B + .77*2B + 1.05*3B + 1.40*HR + .50*ROE + .31*NIBB + .34*HBP – .27*(AB – H – ROE – K + SF) – .29*K
All slightly different coefficients that give us slightly different results. It’s not a big deal, but I wanted to show how different datasets and BsR formulae can influence the run values provided. When all is said and done, though, you’re not going to see a big difference between them.
Applications beyond hitting
LW values have expanded beyond the realm of just offense- it is applied to defense and to pitching metrics. With defense, the run value of a play made above average is the difference between a batted ball and an out, or about .75 runs. For the outfield, it’s about .85 (more doubles and triples, obviously). With pitching, FIP takes the basic run values, places them above the value of a ball in play and multiplies by 9 to attain its coefficients. Uber-stat tERA takes the linear weight value of each batted ball to estimate the pitcher’s defense neutral runs allowed. So LW doesn’t apply just to offense- it has spread to other aspects of the game, as well. The same applies to baserunning runs as well.
All in all, LW are the best way to measure a player’s offense due to its simplicity and theoretical practicality. The process to get them is a bit complicated, sure, but it will always provide you with an outstanding overall view of a player’s value provided with the bat. And it’s a construct that allows you to look at all other aspects of the game, as well.
Expect to see more regular posting from me around late March and early April- I’ve only got a few weeks left before being completely finished with my undergraduate work. I’ve got a few minutes of down time, so I thought I’d post some data. I track both productive outs created and double plays avoided per opportunity when I calculate player value (i.e. WAR; Wins Above Replacement), and figure it’d be nice to share it rather than keep it all to myself. I used to use a static run value for both productive outs and double plays, but have since found a better way to approximate the values- it just so happens that the run value of a productive out is roughly equal to the difference between a sacrifice hit (around -.06 runs in 2010) and a strikeout (around -.27 runs), about +.21 runs. That’s not a whole lot, mind you, but it is something. The run value of a double play avoided (above or below average) is approximately the difference between a sacrifice hit and a double play (about -.44 runs), +.38 runs.
The top five leaders in productive outs:
Elvis Andrus, Julio Borbon, Juan Pierre, Mark Teixeira, and Erick Aybar at +2 runs apiece. All of these guys, with the exception of Teixeira, are small-ball type of players. The trailers? Adrian Beltre, Mike Aviles, Troy Glaus, B.J. Upton, and Aaron Hill at -2. That said, the overall difference between the best and worst hitters at making productive outs is about four runs; about half a win. That’s a noticeable difference and something that should be accounted for in player valuation, since we’re always striving to increase theoretical accuracy. The best team were the Rangers at +8 runs; the worst were the Brewers at -9. That’s a gigantic difference (about two wins).
When it comes to avoiding double plays, Carl Crawford (+5), Curtis Granderson (+5), Carlos Peña (+4), Jonny Gomes (+4), and Brennan Boesch (+3) lead the pack. Billy Butler (-7), Ivan Rodriguez (-6), Adrian Beltre (-4), Wilson Valdez (-4) and Michael Cuddyer (-4) were the worst. The best team at avoiding them were the Rays (+11); the worst were the Giants (-13).
As a whole, the best player was Carl Crawford (+6); the worst Adrian Beltre (-6). That’s close to a win and a half in difference. Again, this is something that we need to pay attention to in player valuation. The difference between the best (Rays at +14) and worst team (Twins at -11) was 25 runs; almost three wins. That’s a lot.
Sorry for the rushed post, guys. I’ve got a lot going on right now. You can find the whole spreadsheet here; I hope you find it as interesting as I do.
My THT Live post. It’s a good thing the Giants aren’t on his list- although he’d make a pretty decent utility infielder if he didn’t cost so darn much.
It feels really nice to finally have some free time.
Forecasting Andres Torres
I’ve noticed recently that projection systems just aren’t very high on Andres Torres. If you’re familiar with the way forecasting systems work, it makes perfect sense- they weight multiple years of data (with the most recent year weighted the heaviest), regress to the mean (Marcel regresses to the mean of all non-pitchers as hitters; Oliver and CAIRO to the positional mean, and ZiPS and PECOTA to the players they compare best to historically) and add an aging factor.
Andres Torres doesn’t really have any of these things going for him. He has relatively little Major League experience (1,025 PA- 740 of which came within the last two years; all previous PA occurred between 2002-2005), which means we have less to work with in terms of making an educated guess about his skill level. This means we have to regress him more towards the mean, which means there is less certainty about his forecast; and, given that he is 33 years old, he doesn’t have much upside left to him (at least, based on standard aging factors). These forecasts are completely unaware that Torres has revamped his swing entirely and is taking medication for ADHD. And really, you can’t blame them for not knowing this. Forecasting systems use the information that is available to them, and they have no clue as to when a player makes a mechanical adjustment or has a mental breakthrough of some sort. This is why scouting is so imperative.
Another factor that has been bringing Torres’ projection down would have to be the appendicitis he dealt with in September. Through August, Torres was hitting .284/.365/.502; a wOBA of .369, about 28% greater than the league average. Torres hit .164/.188/.328 in 69 plate appearances in September and October, which dropped his overall line to .268/.343/.479, only ~15% above the average. Correlation doesn’t equal causation, but it’s pretty clear from the reports and numbers that his major decline in the last two months of the season was most likely due to his health. Believe it or not, this does affect his forecast. I can’t speak for the advanced forecasters, but Marcel’s revised projection- if we exclude his last two months- changes from a .349 wOBA (.264/.337/.465) to .355 (.273/.349/.477). That’s three runs per 600 plate appearances. I know that might not seem like a lot, but it certainly says something about the sensitivity of a forecast. Bottom line: take forecasts seriously, but know that they’re certainly fallible, especially in cases like this. I would expect Torres to regress some in 2011, but not as severely as the forecasts expect him to.
Pablo’s (Hopeful) Reformation
Speaking of uncertainty in forecasts, Pablo Sandoval is perhaps the most difficult player in the Major Leagues to project. Sandoval apparently weighed around 280 pounds at the end of last season. Andrew Baggarly wrote a fantastic article on the matter today, and here are some of the highlights:
He couldn’t take a half-dozen ground balls without panting, hands on knees. His chronically sore hips locked up his swing, especially from the right side.
I actually had no idea that it was that bad. That’s just terrible.
“He ate in a way that crushed his metabolism,” Banning said. “He’d not eat breakfast, sleep till he got to the ballpark, go out at night and eat a mammoth meal, probably some adult cocktails. That’s the way it went down.”
Sandoval couldn’t do three pull-ups in early November. Now he does sets of 10. His legs shook when he tried to squat 135 pounds. Now he is squatting 400. The first day, Sandoval struggled to complete two reps of an exercise called the inverted row. He maxed out at 26 last week.
His flexibility and range of motion vastly increased, too. Sandoval, a switch-hitter, complained of constant hip pain last season, and now acknowledges that the problems wrecked his right-handed swing. (He hit .379 from the right side in ’09 but just .227 last season.)
“It was bad, my hips,” Sandoval said. “I (couldn’t) even get through to the ball. Now I can swing hard. Now I get loose and nothing is sore.”
Sandoval received chiropractic alignments and deep-tissue rubs — what Banning called “hurt-you” massages — to correct the dysfunction in his hips. Three months ago, he couldn’t touch his fingertips to his toes. Now he palms the floor.
Sandoval now stands at a much-improved 240 pounds. Given that he’s 5’11”, this isn’t an ideal weight- but geez, talk about an improvement. It sounds like he’ll have a personal chef with him while he’s in San Francisco; let’s hope that he continues to eat well while on the road. It also sounds like he’s spent some time talking to Barry Bonds about his free-swinging tendencies. I guess you could say that one word describes his future: discipline. If he is able to maintain a strong work ethic not only at the dinner table but show a bit more discipline at the plate, we could be looking at a reformed player. And with his contact abilities the way they are, he could really become an elite hitter.
Productive Outs and Double Plays
The Giants added +4 runs above the league average when it came to making productive outs last season; about half a win. The best in the Majors were the Texas Rangers at +8 runs and the worst the Milwaukee Brewers at -9 runs. The difference between the best and worst teams at making productive outs is approximately two wins. The Giants were -13 runs below the average at hitting into double plays, tied with the Baltimore Orioles for the worst in the Major Leagues. The best team at avoiding them were the Tampa Bay Rays at +11 runs. It would behoove the Giants to avoid double plays in 2011, but that might be a difficult feat- recently signed Miguel Tejada is a double play machine, and early reports of him hitting near the middle of the order have me pretty darn worried.
If you got excited, I apologize for the misleading title. I’m working full-time now in addition to finishing up my undergraduate work- so I haven’t had much time for, well, really anything. There’s a possibility that I’ll be able to put up a post sometime next week, but I’m not sure just how good those chances are. I’ve got a bit of a project going that I intend on posting at THT Live that should spark some discussion, and something on the Giants I’d like to do- so you’ll hopefully see that sooner rather than later.
Since I hate putting up posts that have little to no content to them, I thought I’d post a little early Christmas wishlist for FanGraphs (I originally posted this in a thread in their forums):
Some (hopefully) realistic wishes, some probably unrealistic:
1. Use empirical run values for LW; not approximated ones in which the values are held constant from one another. Have the league wOBA set to .330 for every year- it’s far easier to interpret, and it’s a very easy fix.
2. Situational hitting data- knowing how often the player makes productive outs or avoids double plays is really useful. It really makes things more “complete.”
3. Regress UZR. MGL himself has been saying this for quite some time.
4. IF the funds are available, purchase data from STATS and generate sUZR figures to work in tandem with bUZR for fielding values. It’s expensive as heck, though.
5. No more FIP in WAR. If we’re looking for theoretical accuracy, we want to take the pitcher’s batted ball distribution into account. I’d recommend a BsR-derived version of tRA…no linear equations.
6. A baserunning metric incorporating hit location would be wonderful.
7. A first baseman’s “scoop” opportunities. Otherwise, the “scoops” data doesn’t have much meaning to it.
8. A pony would be nice.
I love FanGraphs, but with all the changes BP is bringing to its site to enhance its metrics I’d like to see FG do the same. Some of the stuff on my wishlist is asking a bit much- situational hitting, sUZR (especially that), scoops, and baserunning runs- but the other stuff, I think, should be relatively easy changes.
It sounds like they’ll be implementing one of them- guess which one that is? Any ideas for other changes?
If I get a bit of down time, I’ll explain why I think FG would benefit from some of these “fixes” down in the comments.
I’m coming out of my study cave for a bit so I can add a quick two cents to a topic that got some play yesterday.
Paapfly has a post regarding Matt Cain and his consistent ability to outperform his FIP and his xFIP. A number of people- by and large non-Giants fans- have referred to Cain as being “lucky” for owning Earned Run Averages significantly lower than his FIP and his xFIP; that he’s due to regress. Cain will undoubtedly regress at some point in his career, but it wouldn’t surprise me one bit if he continued this trend for a few more seasons. Personally, I don’t really care for FIP or xFIP due to the way it is handled- and I figure this is as good a time as any to address my issues with the way people are using it.
First of all, what is FIP?
It stands for Fielding Independent Pitching, and it is a DIPS (Defense Independent Pitching Statistic) created by esteemed saberist Tom Tango. The formula is excruciatingly simple:
((13*HR + 3*(BB+HBP) – 2*K)/IP) + C
Where C is a constant designed so that the league FIP equals the league ERA. The constant typically sits around 3.2, but it varies depending on the league, year, and the run environment. There seems to be a common misconception that the numbers are “drawn out of thin air” or that it’s based on an extraordinarily complex formula- really, it’s not. If you know Tango, you know he likes to keep things simple. The coefficients are derived from linear weights, which are the average run value of an event. The run values for different events in 2010, for example, are as follows: .48 for singles, .77 for doubles, 1.06 triples, 1.42 home runs, .33 for walks and hit by pitches (I’m including IBB in these figures), and -.27 for outs. Multiply each run value by the frequency of the event per ball in play, and you’ll find the run value of balls in play is around -.03 runs. Add this to the FIP coefficients. This gives us .36 BB+HBP, 1.44 HR, -.25 non batted ball outs, and 0.00 for balls in play. Multiply by nine. Now you have 3.2 for BB+HBP, 13.0 for HR, and -2.2 for non batted ball outs. And…there you go. Tango rounds the coefficients for simplicity’s sake, but you’re really never going to see a major discrepancy in the coefficients, at least at the Major League level.
That’s all there is to it. It’s a linear run estimator that regresses all balls in play to the league average. 100% regression to the mean, which is what Robert “Voros” McCracken did with his original dERA. Of course, pitching is not that simple- and this is why I’m so bothered by the frequent usage of it and treatment of it as gospel: it is far from perfect. You see, FIP is merely one component of pitching- sort of like on-base percentage or slugging percentage for hitters. It works better than ERA as a predictor for next-season ERA, yes, but it’s far from ideal and far from definitive. Pitchers have different ball in play distributions, and pitchers have different sequencing patterns. FIP just looks at the basic events and makes a reasonable guess as to how they “should” have performed. It’s essentially a shorthand version of McCracken’s dERA.
And what of xFIP?
xFIP attempts to regress the one batted ball portion of the equation, home runs, to the league average. This helps predict next-season ERA pretty well, but that doesn’t mean it works the same for all pitchers. Again, it’s largely ignoring batted ball types aside from outfield flies. It wouldn’t surprise me one bit if pitchers like Cain- who induce a lot of popups- have a tendency to suppress their HR/FB rate. Just looking at some pitchers from 2006-2010, the top 10 pitchers in infield fly rate average a 9.4% HR/FB rate, which is about 2% below the league average (keep in mind this is a crude look; it certainly warrants a deeper investigation). xFIP takes a simple construct and makes it slightly more complex.
Really, we shouldn’t pay that much attention to FIP and xFIP. Do they have substance? Yes, in that they have some predictive value and they’re one thing to look at among many when evaluating a pitcher. But the simplicity of the formulae prevent them from being more diverse in the pitchers they predict accurately. On average, they work pretty well- but that doesn’t mean that pitchers like Cain fall under that profile. Cain and pitchers like him are not terribly overrated because they outperform their FIP and xFIP- they’re underrated, because the metric doesn’t account for a number of things that are a part of pitching.
And another thing: there is really no reason whatsoever why FIP should be used as a value metric. FanGraphs uses it for WAR, and this makes absolutely no sense to me. If we’re looking for a pitcher’s context-neutral value, we must make an adjustment for the types of balls in play he allows. FanGraphs uses a very complex metric in UZR; why not use tERA for pitchers? It’s a simple adjustment, and theoretically speaking, it makes a heck of a lot more sense to use that for pitcher value than something that assumes all balls in play are equal.