I recently had the opportunity to talk with PITCHf/x guru, Josh Kalk. I'm sure many of you are familiar with his player cards that can be seen at his website, from small ball to the longball. For those of you unfamiliar with Josh, he's a physicist by trade so he brings an extensive background in analyzing data to the field. He started working specifically with the PITCHf/x data almost a year ago getting assistance from several sources including Dr. Alan Nathan and John Walsh. Anyone interested in really getting their hands dirty looking at PITCHf/x can reach him at joshkalk@gmail.com.
1) For those still relatively new to PITCHf/x, could you give us a quick synopsis in layman's terms of how it works.
The
PITCHf/x data comes from two cameras that are generally located down the
first base line and behind the plate in the upper deck. The cameras take
about 25 pictures of the ball as it travels towards home plate and then
sportvision finds the location of the ball from the two cameras for each
frame. Once you have the position of the ball for the 25 time intervals
they run a fitter that calculates the ball's trajectory. It is that trajectory
that you see when you watch a game with MLB's Gameday. With the data MLBAM
then uses a neural network to add real time pitch identification.
2) The system is obviously still a work in progress. How accurate do you feel the basic tracking data to be at this point? Is the 2008 data more reliable than 2007?
We physicists like to talk about accuracy and precision a lot when talking about data. The PITCHf/x data is very precise, but not very accurate. What I mean by that is the PITCHf/x data is very consistant if you are looking at data in just one park but not very accurate when comparing data from park to park. It turns out that this is a very common problem to have and there are known ways of correcting data like that to make the data accurate and precise.
The
2008 data is better than the 2007 data in several ways. First, the operators
of the system will sometimes mess up and actually grab the catcher thowing
the ball back to the pitcher and that has happened much less in 2008. Second,
in 2007 there are many games where PITCHf/x tracked an inning or two and
then nothing for the rest of the game. In 2008, outages like that are much
less frequent and when they do occur they last for a much shorter period
of time. The data doesn't appear to be more precise however (though that
really isn't a problem with the 2007 data) and the park to park accuracy
is only minimally improved.
3) I know there have been some issues with the data coming out of Great American Ballpark in Cincinnati. Are there any other ballparks we should be concerned with? Do your corrections take those parks into account?
Yes
several different parks are clear outliers for different variables. Great
American Ballpark is shifting the horizontal movement of the ball by more
than half a foot so it is easy to spot. Other parks however have other
wrinkles. Coors field is shifting the vertical movement down to the point
that many pitchers appear to generate slightly more vertical movement at
Coors than in other parks. Obviously that doesn't make a lot of sense but
if you just look at a home/road data split it is harder to detect. Fenway
park's camera system is slow by about two miles per hour and Comerica was
all messed up early in the season before sportvision fixed the camera in
late April. Yes my correction system needs to take these things into account
and I will describe that process more in the next question.
4) On your website, you talk about the data corrections you make to the player cards. What exactly are these corrections and why are they necessary?
Actually the data corrections I make I use for all my analysis you see on my blog or my Hardballtimes articles besides the player cards and the web based tools. As we have seen above every park sees pitches slightly differently and some parks have a very distorted view for some variables. Ideally, what you would like to do is have some measure of reality and then correct each park to that but sadly we can't do that because we don't have something like a perfect meter stick that we could go around and measure each camera system. So we have to do the next best thing. We have to calculate a league average and then correct each park to that average. That actually opens up some nasty issues like if the PITCHf/x cameras had a flaw that they always measured speed to be two MPH slower than the true speed this correction system wouldn't be able to fix that. Also, year to year comparisions could be an issue as we are correcting to that year's average. Fortunately, for 2007 and 2008 at least that doesn't appear to be a problem. Here is a slightly simplified math version of exactly how I do these corrections.
While we don't have a nice standard meter stick to go around comparing parks what we do have is pitchers who generally throw similarly moving from park to park. So if I want to look at the difference in the initial vertical release point between Yankee Stadium and Fenway park I have a set of pitchers who have pitched in both parks. I then take a weighted mean with all the pitchers in the sample and calculate that difference. Once I do this for every pair of parks I then can compare Yankee Stadium to any other park and from that I can use a statistical trick to get how much Yankee Stadium varies from league average without actually calculating the league average directly. Do that for all the variables involved and you have your corrections. That is the slightly simplified version at least.
Now you might worry about what happens if a Yankee pitcher changes his arm slot in the middle of the year and when he goes on the road to pitch in the Metrodome he has a lower release point then when he pitched in Fenway earlier in the year. Isn't he going to mess up the Metrodome-Fenway correction? It turns out that the samples we have are large enough (over 400,000 pitches tracked this year so far) that noise like that is drowned out as long as you properly weight the players when you are finding the mean between two parks.
You mentioned Great American Ballpark before so here is a before and after plot to show what is going on.

Obviously
it is pretty easy to see what is wrong with the Great American Ballpark
in the uncorrected data. Once the corrections are made you can see the
improvement. The fastball/change up cluster is in the upper left and that
matches up perfectly but Harang's slider (lower right cluster) is slightly
off still. This happens when a radial correction like the one here is needed.
Fastballs tend to be spot on but off speed pitches are not perfect. If
you look really closely you can see that not only did the home data get
moved but the road data got cleaned up as well. Before corrections the
fastball/change up cluster is very circular but after the correction notice
how after correction the road data is elliptical and that matches up perfectly
with his release point variation. As Harang slightly lowers his arm slot
it also increases horizontally and you get a nice ellipse. For most corrections
small adjustments like that are what is needed but for certain parks radial
adjustments are needed.
5) Pitch classification seems to be so inaccurate as to render it basically useless. Why is that and what needs to change to increase it's accuracy?
If you are referring to the pitch classification that MLBAM does with gameday I think useless is a bit of a harsh word but you basically are correct. The problem MLBAM's algorithm has is it is using uncorrected data so it really doesn't have a chance. If you are in a park like Great American Ballpark and the horizontal movement is off by some much fastballs are going to look like cutters, change ups are going to look like sliders, and so on. Secondly, MLBAM's algorithm needs to make a decision on the pitch type in less than a second so it needs to be very fast. The algorithm I am using takes about ten seconds per pitch to decide so it does a better job but still isn't perfect.
This problem really is one of clustering, or grouping similar objects together. Computers are notoriously bad at clustering but they are slowly getting better. Clustering is also something that I didn't do much of in my past so I have a rather poor cluster algorithm in use right now. When I get a chance I will sit down and really study the issue and hopefully in the not too distant future a better algorithm will be produced.
That
said, no algorithm will ever be perfect. Let me give you an example. Randy
Johnson has been throwing a split fingered fastball for several years now.
Normally, a splitter will move down and away in comparison to a pitcher's
fastball. Johnson's splitter however just moves down in comparison to his
fastball and is a full six MPH slower. Every clustering algorithm in the
world is going to call that pitch a change up not a splitter. So should
you hand adjust that and call it a splitter or continue to call it a change
up because to a hitter that is how it is moving? That is a tough question
and I don't have a perfect answer for that yet.
6) How does release point correlate to results? In other words, are the pitchers who maintain a more consistent release point usually better pitchers?
Right
now with the samples we have I can't answer that question. I have begun
to look at release point and I can tell you who has a consistent release
point and who doesn't and who throws his curveball at a different release
point than his fastball but so far that hasn't correlated well to results.
When I try to correlate difference in release point for curveballs and
sliders to results I get a very small correlation at best. It could be
that once you get within say, half a foot the batter can't tell the difference
so being more consistent than that doesn't matter. If that is the case
a more advanced study than simple correlations will be needed. For change
ups release point seems to matter more but PITCHf/x can only look at the
release point not the arm action. It is possible those two things are highly
correlated but maybe not.
7) Here in NY, all pitching questions lead to Phil Hughes and that one is no different. Let me quote you here:
Because his release point is so steady, you would think that comes from a very repeatable delivery and, indeed, that is reportedly one of his best traits. In fact, you have to wonder if his release point is too consistent. The way he is throwing right now, the batter knows exactly where the ball is coming from. That might be helping him pick up the ball earlier and give him more time to determine the pitch type. This is something that needs future investigation, so consider it a theory right now. (Source)
That's an interesting theory. What other pitchers have you found to have similar repeatability?
I showed in that article a distribution of all pitchers and where Hughes is at and he is almost three standard deviations away from the mean on the very repeatable side. I did some checking and almost all of the pitchers who are close to him are relievers who generally throw two pitches and none of them are really all that good with the exception of Troy Percival. Of the starters who are close to him are Micah Owings who is mostly two pitches as well and Chad Gaudin who my classification algorithm is all messed up on. He actually throws a four seamer, two seamer, change up and slider. So the fact that he is very repeatable is a little surprising. Maybe his move to the pen has helped him out on that front.
As far as the theory goes it is something that I actually feel pretty strongly about as far as Hughes is concerned. In 2008 so far he basically throws his fastball and curve over 90% of the time. His curve has some crazy movement but he doesn't seem to get nearly as many swings and misses with that pitch that one would expect. I am fairly confident that major league hitters are picking up his curve early and not biting. The fact that his curve isn't in the same horizontal or vertical plane as his fastball probably doesn't help but the very repeatable delivery also could be a problem. It is interesting that he blew away minor league hitters with his stuff though. It is very possible that good deception isn't needed to get minor league hitters out but is vital in the show. If that is the case then sending Hughes back to AAA might not really help all that much. He might just need time to learn in the majors but the question is with the Yankees fighting for the postseason do that have that time to give him when he gets healthy?
Hughes
has all the talent in the world but he is not there yet. He might become
a superstar or he could never put it together. Right now he looks like
a pitcher who has several good pitches that don't mesh well together. I
really feel that throwing more sliders in particular would help him out
tremendously but the Yankees seem to think that might damage his arm, which
appears somewhat fragile already. They might be right but I just don't
think he will be able to get MLB hitters out with just his fastball and
curve in their current form.
8.) What else can you tell us about the Yankees from looking at the data so far this year? Where does Joba sit in terms of movement on his fastball, slider, and curve? Any thoughts on Moose's unexpected success in comparison to the end of last season? Andy Pettitte has expressed some interest in coming back next year to open the new Stadium. Going forward, does the data give us any hints as to what to expect from him?
Joba looks amazing. Personally, I think the rotation is the best place for him as his fastball is still a plus plus pitch even when he has to pace himself. It may not be 99 but even 96 is very effective. His slider works extremely well off his fastball because it is in almost the same horizontal plane but then falls off the table (more than slides off the table like the majority of good sliders). He curve still needs some work and his change up looks very inconsistent to me (and to my classification algorithm which it can't locate and calls all of his change ups sliders) but honestly I think he could be a very effective starter with just his fastball and slider. If I were the Yankees I would have him work on his change up more than his curve. He doesn't need any more weapons against right handed batters and a solid change up would be a nice compliment to his arsenal against lefties, more than what his curve projects to be at least.
Mussina's fastball is down to under 87 MPH on average and he is throwing it less often than he did a few years ago. He is throwing the entire kitchen sink out there though with a change up that has over 15 MPH difference from his fastball. He has has a slider, curve and slurve which are clearly distinct but my algorithm requires every pitch to be labeled one or the other so Mussina has two curveball clusters on his player card. Jake Peavy does something similar and there he is listed as having two sliders. Pitchers can be successful throwing that softly but the margin for error is significantly lower. Also, Nate Silver has found that throwing hard is useful in the postseason so it is possible that as the competition gets tougher Mussina will get worse. If the Yankees bring him back next year it should be as a 5th starter with someone waiting to take over if he can't cut it.
I am
higher on Pettitte than I am on Mussina going forward. Pettitte still is
getting it up there near 90 MPH and still has some very nice off speed
offerings. He is also a few years younger and could probably afford to
lose another MPH or two on his fastball where Mussina would be in the BP
fastball range if he lost a couple more MPH. If the price was right I wouldn't
have any issues bringing Pettitte back but the free agant market for starting
pitching is very strong this year and if a few extra million brings you
a Ben Sheets or CC Sabathia then the decision is clear.
9)
You wrote three very insightful articles for The Hardball Times in
which you used Pitchf/x as an application for your conclusions. These
articles were:
Do
relief pitchers suffer from pitching back-to-back days?
How
fatigue affects a pitcher's fastball
Preliminary
aging curve for fastball speed
How comfortable are you with the conclusions you've drawn on these topics? Do you plan any long term follow up?
The relief pitchers working back to back days is the study I am least comfortable with. That was done entirely with 2007 data when the coverage was spotty. That study needs updating to 2008 data in the worst way but I am not sure how much interest there would be in that update as a hardballtimes article. I am considering updating things like that at my blog where the more hard core stats people visit. In my Hardballtimes articles I am really trying to make the math and explaination as easy as I can to appeal to a broader audience. I was never trained as a writer so sometimes I get a little terse but hopefully my writing style is getting better.
The aging curves I am fairly confident in. The spotty coverage for 2007 is somewhat covered up by just looking at averages instead of needing back to back appearances. When I started the study I was concerned about a lack of statistics but as you can see the error bars there are extremely low so the statistics are there for things like that. The biggest concern with that study is that most of the data from 2007 is from the last half of the year and all the data from 2008 was from the first half. As long as the different age groups didn't wear down differently as the year went on the study should be fine. In fact, it isn't clear if younger pitchers who aren't used to throwing as many innings would suffer more than older vets that might not have as much left in the tank. How different age groups do wear down as the year goes on is definitely on the list for things to do but I want to wait until the end of 2008 to do that. Once that is done though that will be able to be applied retroactively to the 2007 data to fix any issues there and a follow up study will be done that covers not only fastballs but other pitches.
The
fatigue study is rock solid in my opinion. Because I am comparing back
to whatever fastball a pitcher started the game with you don't even have
to believe in my data corrections just that the data is precise as the
game goes on which it pretty clearly is. I was shocked to say the least
when the results came back that the avergae pitcher doesn't lose much on
his fastball at all but then I remembered all the times I heard a broadcaster
say something like "He still is hitting 94 on the gun in the 8th inning".
It just turns out that that pretty much is the case for every pitcher and
the fact he still has good fastball speed shouldn't be used as proof the
starter should still be in the game. Right now, the most interesting thing
to me is the result that it takes starters a few pitches to get warmed
up. Starters have to walk a fine line getting batters out but still saving
themselves to go deep in the game. If you know your starter really well
you should be able to optimize how much warm up he needs to maximize his
stuff in game. Also, if you are hitting against a guy like Ben Sheets in
the first inning you should be much more willing to hack away as his fastball
is worst in the first 10 or 20 pitches. The old adage of getting to a starter
early because you might not get to him late seems to apply here.
10) There seems to be an endless supply of things we could learn from this data. Could you elaborate on any of these possibilities:
*
How are pitches affected by environmental conditions, ie temperature, humidity,
altitude, day/night, etc.?
*
Why do some pitchers decline more rapidly than others with the same skill
set?
*
How much can we learn about projecting future performance?
*
What can we learn about hitting tendencies by looking at pitch data?
*
Can it be applied to umpiring tendencies as well? ie what percentage of
balls/strikes do umpires get correct? can umpires be rated effectively
using this information?
*
What can we learn about balls in play from pitching data, i.e. inducing weaker contact, decreasing/increasing the ability to pull the ball, etc.?
*Is there anything to learn about the efficacy of pitch counts in relation
to the types of pitches thrown?
Some enviromental issues like altitude and temperature I have a good handle on and I adjust for. Both the spin and the drag on a baseball are porportional to the air density and if you know the temperature and the altitude then you can find the air density rather easily. So correcting for Coors or, to a lesser extent, the heat in Atlanta is something that should be done by people looking at the data and it isn't that hard to do. Humidity is another issue because I can find no reliable data for the humidity in the park during a game. If you know of a resource that has this please let me know. Humidity should work similarly to temperature and altitude but also will affect the baseball directly and if the ball swells or contracts the surface area will be reduced. That should affect the flight as well and is something I would like to study. As far as how temperture affects the pitcher's arm and increases or decreases the speed on the fastball that is something that several of us have talked about studying. You would like to just correlate the temperature of the game to the speed on the fastball but because cold days are at the beginning and end of the year so pitcher wear patterns also might be affecting the data. So to properly study this you will first have to tease one effect out. Again, this is something I would like to do but the end of the year would probably be a better time to look at that.
As far as pitchers with the same skill set declining differently that is something I am very interested in. I have developed some simple similarity scores for pitchers that I am hoping to use to study that and other things. Control is something that right now is very hard to quantify in this data but I suspect that control has a lot to do with different decline rates.
Predicting future performance should go hand in hand with that. Similarity scores are a good first step but a mechanism like PECOTA will eventually be needed. Right now my understanding is Nate is leveraging rate stats like K/G and K/BB heavily and had some success with that but sometimes you look at the PECOTA comps for a pitcher and you scratch you head wondering how a soft tosser gets matched up with a fireballer. I believe that the accuracy could be improved if pitcher types were included but that probably should just be one factor in predicting future performance.
As far as looking at the umpires, yes the data can be used for this but I have chosen not to publish any results looking at umpires or strike zones or things like that. MLBAM is giving us this data for free right now and they would prefer this data to be used for studying the pitcher/batter relation and not the pitcher/umpire relation. They would be well within their rights to just turn the spigot off and no more PITCHf/x data for anyone so I encourage everyone to limit their published studies on the topic.
That said, privately this is something I have looked at and I can tell you that the differences between umpires are very small and they are extremely accurate. I also caution you when watching local broadcasts and ESPN when they show things like K-ZONE and the ball is shown off the plate but called a strike that these systems are only checking the ball as it crosses the front of home plate. MLB rules state that if any part of the ball touches any part of the plate it is a strike so if the ball is close and is moving towards the plate it can easily touch the plate at some other point than the front. This is especially true with sidearmers who come at hitters from a wide angle.
HITf/x is something that is really needed before looking at balls in play in my opinion. You can do things like look at ground ball fly ball ratios for different pitches and that can be useful but there are a lot of fly balls that are easy pop ups and a lot of grounders that get down the line for a double. HITf/x will really simplify things in that regaurd.
As
far as how different pitches affect the wear on a pitcher that is something
that I think will be a crowning achievement with this data. A near flawless
pitch identification algorithm will be needed and also long stretches of
perfectly recorded data. The 98% or so they are at right now is great but
missing that 2% is a big issue for something like this. Something like
an adjusted pitcher abuse points will come, it is only a matter of time.
When it does the results should be fascinating and potentially game changing.
If a change up really should only count as say 0.8 of a pitch and a slider
as 1.3 pitches that would be extremely useful for a team to know. Also,
warm up pitches at the start of an inning could possibly be looked at and
rated.
11) And finally, HITfx: Not so distant future or a long way off?
I talked
to the guy working on HITf/x about a month ago and it still seems longer
off than many think. A six inch error over the 60 feet to the mound is
really hard to see when ESPN shows the balls flight path but if you move
that out to 360 feet in the outfield the error gets magnified. Also, right
now they need the ball to land in play for them to measure it and really
what everyone would love to do with this data is study home runs because
chicks dig the long ball.