How (In)consistent is Jonathan Sanchez?

August 2, 2010

If you were to ask me to describe Jonathan Sanchez in one word, it would be “erratic.”  It feels as though he is the embodiment of Robert Louis Stevenson’s “Dr. Jekyll and Mr. Hyde”- remarkably brilliant in one start; absolutely horrendous in the next.  Every time he steps on the mound, I ask myself, “which Sanchez am I going to see today?”  It gets tiresome at times, as it is more than apparent that he has the talent and the stuff to be a top of the rotation starter- but then he’ll lose his focus, and his control goes out the window.  His pitch count rises, and he’s pulled out of the game by the fifth inning.  But then, there are the starts where everything clicks.  Nothing can get in his way; he pounds the strike zone and paints the corners with his low-90′s fastball and a slider that can be untouchable.  But is Sanchez really as inconsistent as I (and many fans) think he is?

One method to check- or at the very least, estimate- Sanchez’s consistency is by looking at his Game Scores.  Game Score, which is a metric developed by Bill James to evaluate a pitcher’s start, is a rather simple rating system:

1. Start with 50 points.

2. Add one point for every out recorded (3 points for every inning pitched).

3. Add two points for every inning completed after the fourth.

4. Add one point for each strikeout.

5. Subtract two points for every hit allowed.

6. Subtract four points for each earned run allowed.

7. Subtract two points for each unearned run allowed.

8. Subtract one point for each walk.

To get an idea of how Sanchez typically performs compared to the other Giants’ starting pitchers, here are the average Game Scores for each pitcher over the past two and a half seasons:

Lincecum, of course, has been full of outstanding starts over the past couple of years, although he’s scuffled a bit so far this season.  Zito’s seen a nice increase in the quality of his starts, and so has Sanchez.  Matt Cain has been quite good the past couple of seasons, but you already know that.  What becomes very apparent is that it looks as though the Giants’ starting staff has, by and large, had well above average starts the past two seasons.  No wonder they’re widely regarded as being one of the best rotations in baseball.

Now, how about that consistency?  One method I like is one outlined by Rob Castellano over at Amazin’ Avenue, who uses the standard deviation of a pitcher’s game scores to get an idea of how consistent a starter is.  The lower, the better.  According to Castellano, the league average standard deviation is right around 16.  Anything above that indicates that the pitcher is less consistent than your average starting pitcher; anything below indicates that they’re more consistent than the league average.

In 2008, both Zito and Sanchez were wildly inconsistent, in addition to having mostly below average starts.  Cain was right around the average, and Lincecum was remarkably stable.  2009 saw a pretty equal distribution of consistency, as all pitchers- aside from Cain, who was very consistent- were average.  And so far in 2010, we see Cain being remarkably inconsistent, with Lincecum and Zito at average, and…Jonathan Sanchez the most consistent starter?  What?

That’s right- Jonathan Sanchez has actually been the most consistent starter in the Giants’ rotation in 2010.  Now, let’s not confuse this for quality- remember, Lincecum and Cain have the highest game scores and are, by all accounts, the best pitchers on the staff.  All this means is that Sanchez has been the most consistent at pitching around his average game score.

  1. Bradley Emden permalink
    August 4, 2010 6:22 AM

    Your missing a key point. Its easier to be more consistent when you pitch less innings. He consistently averages less innings, so that his game scores will not vary as much. If Cain pitches 6 innings in one start and 9 innints in another, even though he pitches nicely in both starts his game score will vary despite throwing 75% or so quality starts, whereas Sanchez is under 50% quality starts. So Cain is the most consistent this year in having a quality start, followed by Lincecum and Zito, with a big drop off in Sanchez. Which I guess simply means that the bullpen will be more taxed and get more pitchng time during the course of the season during Sanchez starts. If on looks at percentage of quality start as a measure of consistency, your original hypothesis based on your observations would be correct. You essentially used statistics to lie. The standard deviations that you measure were not really measuring any important consistency because a good game vs a bad game may have a sharp cut off point, which does not correlate well with correlation data.

    • triplesalley permalink*
      August 4, 2010 6:53 AM

      Good point, Bradley, and something I certainly overlooked. There is a bias in the data, which skews the results.

      The problem with using something like “quality starts,” though, is that you’re using a rather rudimentary estimate of effectiveness with a binary design. It’s not as simple as “yes/no;” it’s more complex than that. A player can pitch six innings with three earned runs, two “unearned” (whatever that means), eight walks and a strikeout and still get a “quality” start. I don’t know about you, but there’s nothing “quality” about that to me!

      Game score is a step up but is still very much imperfect. I would prefer to use something that we can measure the variation from the mean as an estimate of consistency rather than QS, which doesn’t measure consistency at all- for all we know, a pitcher can be wildly inconsistent in his starts yet still rate well in QS% due to its impractical assumptions.

  2. Bradley Emden permalink
    August 5, 2010 3:29 AM

    I definitely agree that the word quality in quality start is a misnomer, and it is a term given to a wide range of starts. However in no case should a quality start mean that one has pitched less than 6 innings or given up more than 3 runs, so walks and hits withstanding it at least gives a standard. Lets say you have a parametric measure from 1 to 100. Lets say if a person has any value less than 75 he is healthy. Any number over 75 and there is disease. A person who has a value of 1, and a person who have a value of 74 are both healthy. A person who has a value of 76 is not healthy. Yet on a parametric scale, not considering a key cutoff point could yield very misleading results by using standard deviation. Someone running levels over time in the sixties, and then in the high seventies would have a lower standard deviation than someone who was running levels in the teens, and then spiked levels into the fifties or sixties. But the key to the clinical standard, which is the cutoff point as a measure of health is 75. Although this analogy is by no means perfect, it could explain how one’s perception of inconsistency or consistency may not be held up by the numbers if the numbers are not measuring what you think they are measuring. Another example of a measurement in a study was the nonparametric measure of seeing what frequency the radio was tuned into, when a valet parker picked up a car. It was a method used to determine program ratings. I remember always listening to a 50 minute show on my way to work (1 hour drive), but at the end I would change the channel for a few minutes during their long commercial to listen to the news. When I got to work and turned the car over to the Valet, he really would not report the station that I had listened to for most of my ride, but the station I had just switched to. I would suggest that your general observation that Sanchez is inconsistent is correct, I just think the metric you used to justify consistency was not the best metric to tell you what you already know instinctively. Sometimes one’s instincts are wrong, or lie, but in this case I think your instincts are correct and the statistics are not the correct ones. It is hard to argue that greater standard deviation is not supportive of greater inconsistency, but in this case it just may not be the important measure. And I would argue that indeed it is not a great measure in this scenario of consistency/inconsistency.

