A Word on Park Factors
I’ve been thinking about park factors a lot lately. To be honest, that sounds rather odd and I never thought I’d ever hear myself say something like that…but, it’s true. Park factors are designed to give us a bit of context for a player’s production. A hitter that posts a +30 LWTS season in PetCo Park has provided more production than a +30 LWTS hitter in Coors Field. What gives us reason to believe this?
Well, we know that certain parks affect hitters differently than others. Some places have a short right field porch that makes it easy to poke a fly ball over the fence; others have deep, cavernous outfields with high walls that make home runs exceptionally hard to hit. It only makes sense to adjust for that, and that’s what park factors attempt to accomplish.
We need to be careful with the factors we use, though. A standard park factor that should be taken with a grain of salt, for example, is one that you see on ESPN:
PF = ((homeRS + homeRA)/(homeG)) / ((roadRS + roadRA)/(roadG))
The concept is very simple: the park factor is the ratio of runs scored at the home park compared to the runs scored on the road. The problem with this approach, however, is that it’s overstating the effect the park has. A player only plays half his games at home; therefore, we must divide the park factor by two. In the link provided, you will see that PetCo Park has a park factor of 0.741. This implies that run scoring is suppressed by 25.9%, and a player’s production level (i.e. LWTS) will see a drastic increase as a result. But since the player is only playing half his games at said park, this changes the equation to a far more reasonable…
PF = (1+ (((homeRS + homeRA)/(homeG)) / ((roadRS + roadRA)/(roadG)))) / 2
This changes the park factor from .741 to .871. That’s a 13 point difference, which is rather large. To get a “truer” estimate of the park’s effect, we regress the park factor depending on the same of years we’re working with. Since we’re using 1 year, we regress around 40%, although the figure is relatively arbitrary. This changes our park factor to 0.923. No longer is PetCo Park a horrendous monster that kills every ball hit into play, it is now a park that simply suppresses hitter production by about 7.7%, due to our one-year sample. Conversely, Coors Field is given a park factor of 1.247. The first adjustment lowers it to 1.124; the second 1.074 (7.4% increase in production). Simple adjustments in these cases make for a big difference.
Personally, I prefer Patriot’s methods outlined in this article. The formula is slightly more rigorous than the simple ESPN one:
iPF = (1+ (H*T / ((T-1)*R+H))) / 2
Where “H” stands for the number of runs scored per game at home, “T” stands for the teams in the league, and “R” stands for the number of runs scored per game away. “T-1” allows us to make all but one of the stadiums in the league the park in question, so we’re not counting the park an extra time. We then apply regression with the formula 1 – (1-iPF) * x, where x = .6 for one year, .7 for two, .8 for three, and .9 for four plus years. In other words, no matter how many years we’re working with, we still regress the park factor by 10%. We always assume a park is closer to neutral, if for no other reason than not trusting the park factor to capture all of the nuances of the park’s actual effects. By the way, I use this formula because I’m comfortable with it and because it’s intuitive to me. There are plenty of other formulations out there that work just fine, and I encourage you to look in to them and find one you like. But, I begin to digress.
My major qualm with park factors, which is the purpose of this whole post, is that the application of it is lacking an adjustment that seems imperative to assessing player value. Let me give you an example.
Adrian Gonzalez had one heck of a 2009 season in San Diego, which is known as being a severe pitcher’s park. He still managed to post a .277/.407/.551 line despite having 49.2% of his plate appearances at PetCo. With the following equation, we can estimate his production level in 2009 (run values based on NL 2009):
LWTS = .48*1B + .79*2B + 1.08*3B + 1.42*HR + .51*ROE + .31*NIBB + .33*HBP + .15*IBB – .29*Outs – .31*K + (.115 * (1 – PF) * PA)
Where “outs” are AB – H – ROE – K + SH + SF.
The last part of the formula is often not displayed in LWTS formulae; it is simply the park adjustment method outlined in Palmer and Thorn’s The Hidden Game of Baseball. .115 are the runs scored per plate appearance. If a player has 700 plate appearances in a park environment of 0.95, for example, then he gets a +4 run bonus. On the other hand, if he is in a 1.05 environment, then he loses -4 runs. Nice and simple. If we plug in Gonzalez’s batting line with PF set at 1.00, a neutral park (i.e. his true, raw numbers), we get +40 runs above the average hitter. Now, here’s where it gets a little fishy. Let’s put PetCo’s park factor in there and see what happens. PetCo’s five-year Park Factor is 0.91. If we add this into the equation, Gonzalez becomes a +47 hitter. We can just call it a day, right?
Wrong. This is the assumption that most (if not all) applied park factors make, and it is incorrect. Adrian has played half his games at PetCo, which suppresses the run environment, but he’s still playing his other 81 games at other ballparks, all of varying dimensions and degrees of difficulty. Heck, he played 6% of his games in Coors and Chase Field, which are quite the hitter’s parks. If we weight the parks by his plate appearances, we can get a better idea of what his “true” park factor should be:
Gonzalez’s weighted park factor is 0.96; quite a bit higher than the original 0.91 we were working with before. If we plug this into the equation, Gonzalez’s LWTS go from +40 to +43. The difference between this method and using the original one is four runs, which is quite large.
There are a number of issues with park factors that I haven’t touched on in here, and this is only one of many things to take into consideration when attempting to adjust for a park’s effects on a hitter. Personally, I much prefer this method to the original. This isn’t breaking new grounds, by the way- I’ve heard other people mention it before. I just don’t think I’ve ever seen it actually applied.