There’s not much information about a major league game that can’t be
found online from one of a few sites these days, but minor league data is stuck
back in the dark ages, so I’ve scraped everything MiLB tracks from every minor
league game over the past 5 years to create the most complete minor league stat
page to date
The Problem
Thanks to PITCHF/x, Statcast,
MLBAM, and other stat tracking companies, baseball fans can learn every piece
of information about a major league baseball game, from how fast pitches were
thrown (or hit), to an outfielder’s reaction time and route efficiency on a fly
ball to the gap. Sites like Fangraphs and Baseball-Reference offer park and
league adjusted stats to compare and analyze major league players, and further break
down performances by every split you can imagine. However, when it comes to the
minor league side of the game, unless you go to the game with your own
calibrated radar gun, you’re not even guaranteed to get reliable pitch
velocities, and no tracked pitch information (type/speed/location) is available
for free online (possibly at all?). Also, Baseball-Reference has a few splits,
but they only track the basic stats for those splits, and you can’t compare players
by split. Similarly, while Fangraphs posts FIP for minor league pitchers and a
league-adjusted wRC+ for minor league batters, they don’t share the constants
they use, or offer any minor league splits. Statcorner and Minor League Central
offer a little more batted ball information (although Minor League Central
hasn’t updated since 6/18/2015), and you can get spray charts from MLBFarm, but
no site brings it all together… until now. Introducing the Astromets Mind Minor League Stat
Page, which is my attempt at bringing the best from the aforementioned stat
sites into two Tableau worksheets – one for batters, one for pitchers.
With
the help of the pitchRx
package for R, I scraped all of the information available from MLB for games played between
2011 and 2015, from AAA down to Rookie ball. The MLB site in that link is an
XML version of Gameday, so I scraped everything available from MiLB Gameday,
which includes a pitch log, play log, ball in play log, and other less
important information. I then used that database to create a Minor League Guts
page, which has the linear weights and 1-year regressed park factors I used to
create the stat worksheets. I tracked a dozen splits and all of the stats
available and applicable to those split breakdowns (full glossary of stats used
here), and
gave each one its own tab in the worksheets. The splits I currently include in
the worksheet are: Month, L/R, Home/Away, Batted Ball, Pitch Count (currently refers
to the results of PA’s ending in a #-# count), Outs, Base State, Inning, Times
Faced, Opponent, Field (opposite, pull, center), and Position. For comparison,
the Baseball-reference minor league splits are: Home/Away, L/R, Month, a couple
of base states, and younger/older. I also included a tab that allows you to
pull up player Gamelogs, a ‘Season Totals’ tab, and spray charts.
The Gamelog tab has a mostly hidden
column that you can just ignore, and that is only there so when you click on
any stat cell for any game, a link to the MiLB Boxscore for that game will
appear. The batters worksheet has 3 tabs dedicated to spray charts, while the
pitchers worksheet has only the one (although I’ll probably add a spray chart
comparison tab to that worksheet in a future update). As the description
suggests, the spray chart comparison tab (which is currently the default
opening display for the batters worksheet), allows you to compare the spray
charts of any two players, or of one player over multiple seasons (hat tip to
Bill Petti, who created the original spray
chart comparison worksheet for major league players). I also included a
defense spray chart in the batter’s worksheet, which tracks all plays fielded
by a player.
Since I only calculated 1-year
regressed park factors, I created two versions of the +/- stats, one that is
only league-adjusted, and the other that is both league and park adjusted –
stats beginning with a lowercase ‘p’ have the park adjustment (pERA-, pFIP-,
pxFIP-, pwRC+). Finally, I included league average rate stats that will appear
when you hover over a stat (broken down by split where appropriate), which is
something I’ve only ever seen at Statcorner.
Disclaimers about the data
The
standard data is reliable for all leagues, but ball-in-play and pitch-by-pitch
data appears to be less reliable for leagues/parks with no data stringers. As
far as I know, if a park has a data stringer who works the games, then there
will be a live MiLB Gameday link available for their home games. If not, the
MiLB.com page for that game will only have links to a box score or game log (example),
and the official scorer phones the game updates in at the end of each inning or
pitching substitution. The data stringer inputs the pitch-by-pitch information,
and they are responsible for marking where a ball was fielded, which is used in
the minor league spray charts we have available, but I’m not sure how much of
that detailed information is tracked and reported by the scorer, and whether
the information reported is consistent for all levels. It’s clear that in some
leagues/parks (like the rookie level GCL/AZL), the pitch-by-pitch information
is not tracked at all, as all walks are 4 balls, all strikeouts are 3 strikes,
and all other plays end on 1 pitch. I didn’t treat these leagues any
differently when calculating stats, so there will still be pitch information
that shows up in the worksheets, but the pitch counts or ball in play
percentages should stick out as looking wrong. I don’t expect anybody will be
looking too closely at splits from rookie ball anyway, but I’m giving this
disclaimer in case something looks funny about pitch data from the A/A+ level.
The example link above brings you to a game where the St. Lucie Mets were home
team, and the accompanying components
link shows that they did not track the information at the pitch-by-pitch level.
You can still get some pitch information on St. Lucie players, as there are
teams in the FSL that have MiLB Gameday, so the information will show up in
individual games of the gamelog, but you should otherwise just ignore the pitch
information from the FSL. I should be able to create a fix to exclude that
unusable pitch log information from the dataset, but that’s just one of a few
minor fixes I plan to make for future updates. You shouldn’t have to worry
about that problem for AA/AAA games, and that’s when that information is of
most interest anyway.
Another
potential problem with the pitch log or ball in play information is human or
CPU error. When I tracked the pitches of Thor and Matz in the minors with
MiLB.tv, I’d occasionally notice some misclassified pitch result (called strike
instead of swinging) or missing pitches in the Gameday pitch log. Also, some
ball in play location information is y-shifted for some reason, which I suspect
is a CPU error. I know MLBFarm is using the same data and corrections for their
spray charts as I am because they have the same occasionally shifted data
points in the same places – for example, the default Defense Spray Chart is for
Gavin Cecchini’s 2015, and it shows that he committed an error on a groundball
in shallow CF, but I’ve posted the GIF for that error (see all of them here),
and you can see that he was actually near 2B.
Future Efforts
I
still have some minor updates to make to the worksheets, but they should be
otherwise ready for public consumption, which should give you something fresh
to look at until Spring Training games start in a couple of weeks:
-
I’d like to add some more player information,
for example DOB and positions played, and forgot to track games played/started.
-
A few times a player would appear on the Gameday
XML page with a pitcher/batter ID, but no name listed, so I need to create a
master list of ID’s and names that can be used to fill in the missing names.
-
I need to improve the ‘Fielded By’ list for the ‘Defense
Spray Chart’ tab, because right now a few players are missing a little information.
I had to extract fielder information from the Gameday at-bat description, and
getting the full name was tricky sometimes for players with initials instead of
a first name. It shouldn’t be a hard fix, but it will be time consuming (for my
computer), and fixing it wasn’t high on my priority list before sharing these
workbooks.
-
I’d also like to fix the hover response to only
show a league average rate stat if you are hovering over the cell in the column
of that rate stat, because right now it will show you all the rate stats if you
hover over any cell, and that can lead to a big pop up box (unless I head that
people like it as is).
-
Lastly, the list of Gameday ID’s pitchRx had
available for 2011-2015 appears to be missing a few games in a few leagues per
year, so I’d like to scrape the missing data to complete my database. However,
I’m probably not going to be tracking down those few missing games before next
offseason because the missing games count is such a small fraction of the total
that it just wouldn’t be worth my time right now.
Wrapping things up for now
Naturally, if you have an idea for other stats or splits that you’d like to see included in the worksheets, leave a comment below or with @Astromets31. You may have noticed that I also have a Winter League stat page, which will track the same stats from the ABL, DWL, MPL, PRL, and VWL over the past 5 years, but the page needs to be updated, and the pitchers worksheet created. I’d also like to create a similar worksheet for the AFL data, except pitch information is available for those games, so I will be able to include even more stats and splits – not sure if I will get to the AFL workbooks before the season starts though. As for 2016 minor league season, since Tableau public doesn’t allow you to auto-update the data connections, I may have to create a separate worksheet with only 2016 stats (so it’s not such a huge update every time) and/or just not update the worksheets that often.Virgo Cluster galaxy’s stunning gas tail | Astronomy Now https://t.co/U3bPBM39Be pic.twitter.com/wZbryZuTr1— The SETI Institute (@SETIInstitute) February 24, 2016
I LIKE IT :)
ReplyDeleteUnlike other blogs I have read which are really not that good. visit this site hurry up waiting your order Thanks alot! https://view.ly/v/I6i2zHGxK4JY
ReplyDelete