Comparing Run Environments From the Majors to Rookie Ball | Astromets Mind

Tuesday, January 19, 2016

Comparing Run Environments From the Majors to Rookie Ball


Average batters age by league for the 2015 season

Before we start evaluating prospects, we should get a sense of the offensive environments in which they played.

            Thanks to Fangraphs, we never have to guess how much a player’s performance is influenced by the league’s run environment; we just have to compare their wRC+ or ERA- (or other +/- stat) to the league average of 100. But Fangraphs only publishes advanced pitching statistics for the major leagues, so let’s take a step back and look at the league’s overall run environment before inspecting or comparing minor league pitcher’s stat lines. We really should be investigating league and park specific run environments regardless, but it has to be the starting point when there are no league-adjusted stats readily available.
            Basic league batting, pitching, and fielding stats for the 2015 season are available for all leagues on Baseball-reference here, but looking at raw count totals is not very insightful. So I converted the raw count totals to per-game rate stats, added a few missing stats, and created bar graphs so we’re not just stuck looking at numbers. Also, to avoid overloading this page with graphs, I uploaded all of that data into the interactive Tableau public worksheet below. Let me know if the worksheet is giving you problems or slowing things down too much, because otherwise I’m looking forward to incorporating this feature more in the future. 




            With 19 leagues and 20 stats to explore, I'll mostly leave it to you to find what you're looking for, but let's look at a few patterns, starting with the R/G table, which should be the default setting for this worksheet. There’s a difference of exactly two runs per game between the highest scoring league (Rookie Pioneer League at 5.73 R/G) and the lowest scoring league (A+ Florida State League at 3.73 R/G), and the league exactly in between those leagues (AAA Pacific Coast League at 4.73 R/G) is known for its high scoring environment. Of the full season leagues, the PCL had the second highest run-scoring environment, with only the California league (4.9 R/G) scoring more often. The South Atlantic League and Texas League scored about as often as the AL, and the Southern League was closest to the NL. When you switch to the H/G table, you see that 3 leagues stick out as above average (PCL, California and Pioneer), and that the GCL sticks out as below average. There isn’t much variation in doubles per game across all leagues, but triples per game tend to decrease at higher levels in the minors, and homeruns tend to increase. The Florida State League is a pitcher’s haven because it’s where homeruns go to die, and this leads to announcers getting excited about deep fly outs too often. The FSL was also the only league to have a lower BB% than the major leagues, while the California league was the only full season league to have a higher K% than the major leagues. I also included 'AB/XBH' and 'PA/HR' options, if you prefer to look at the rates that way. 
            The above data will give you an idea about the frequency of events across affiliated leagues, but what if you want to know the value of those events? Again, Fangraphs publishes this data for the major leagues (see their guts page), but we’re SOL on the minor league side. Fortunately, linear weights are pretty easy to calculate if you have “tidy” play-by-play data. Unfortunately, “tidy” play-by-play data isn’t easy to find for the minor leagues. I ended up using the pitchRx package with RStudio to scrape the necessary minor league data from MLB, but that site appears to be synced from minor league gameday/gamelog pages, and so there is occasionally some missing info. Still, there was enough to replicate most of the guts page for each league, and while not perfect, the linear weights below will calculate wOBA to within a few points of the Fangraphs reported wOBA. Also, I didn’t include an average wOBA option in the Tableau worksheet above, but you can get that info from the first column in the table below.






            The final column, labeled “Delta Mean,” is the average absolute difference between Fangraphs wOBA and the wOBA value calculated with those linear weights (I only included position players for the comparison). As you can see, except for the GCL/AZL leagues, these weights will generally get you within .005 of the Fangraphs reported wOBA if you choose to use them (say, with the splits info from the www.baseball-reference.com minors pages). I think the only reason there is a difference is because Fangraphs scales their weights so that the runSB value is always 0.200, but the stolen base info I scraped didn’t match league totals (the other stats were fine), and so I had to scale my weights to the Fangraphs league average instead.
Looking ahead, I’d like to come back to this topic to look at how the run environments have changed across the minors over the past few seasons, and to investigate run environments at the ballpark level. If you’re interested in getting a better idea of how individual ballparks play now, Minor League Central has reported one-year park factors for the 2011-2014 seasons, and will hopefully update with 2015 numbers soon. Assuming the scraped pitchRx data adds up correctly, I'd like to take those park factors one step further by looking at L/R splits too.




  • 0Blogger Comment
  • Facebook Comment
  • Disqus Comment

Leave your comment

Post a Comment

comments powered by Disqus
submit to reddit