A poster mentioned giving Barstool Rodeo a guest column last week, and after a few emails back and forth, here's his first installment. It's impressive to say the least. I had to read it a few times to fully grasp some of the numbers, as math isn't my strongest subject, but it's interesting and also confirms what was already suspected. I'll pin to the top. If you have questions, ask them, and I'm sure he'll be happy to explain.
Note: Stats were compiled before the Vandy game last night.
Over the course of SEC play it has become increasingly evident that Ole Miss has trouble driving in
runs. What makes this especially interesting is that Ole Miss ranks near the top of almost all offensive
statistics outside of run production. The Rebels have been so bad with runners on base that it almost
seems like an anomaly. This is probably not news to any of you. I wanted to take it a step further,
however. I sat down the other night with the goal of using sabermetrics to measure how much the
Rebel offense was underachieving in numerical values. The results are interesting and very telling.
Model
If you’re not familiar with sabermetrics, it was the underlying protagonist of Moneyball. It is basically
a form of baseball related math that nerds created to establish a niche in athletics and help general
managers accurately measure and predict production. In this case I referred to multiple models that use
basic offensive stats to predict how many runs a team or player should account for. I won’t go into all of
the math here to keep this from sounding like a math lecture, but I’ll make a forum post explaining the
math for anyone interested. The model I focused on primarily was Furtado’s extrapolated runs formula
with an added equation that accounts for unearned runs. It’s a linear model that uses the following
stats: at-bats, hits, total bases, walks, hit by pitch, grounded into double play, sacrifice flies, sacrifice
hits, stolen bases, caught stealing, strike outs, singles, doubles, triples, home runs, league fielding
percentage, and outs in play.
Results
This model proved to be the most accurate. When run on the combined SEC stats (excluding Ole Miss)
it predicted runs scored within 0.12%. It showed 1372.31 theoretical runs to 1374 actual runs. When
applied to individual SEC teams this is how far off their actual runs were from their “expected” runs
(conference play only):
6 teams within 5% of expected runs.
3 teams between 5% and 7% of expected runs.
2 teams between 7% and 10% of expected runs.
1 team with a 15.94% difference in expected runs and actual runs.
Can you guess who the last team is? It’s Ole Miss, and they aren’t 15.94% in the right direction. What
this means is that the Rebels have only scored 84% of the runs they should have theoretically scored in
SEC play this year based on their offensive numbers. They have only scored 120 runs in SEC play while
sabermetrics show they should have scored closer to 140.78 runs. This is over three times the average
difference among all other SEC teams: 4.64%.
Reason
The hardest part of the analysis is understanding why this is (especially without split stats). Batting
with runners in scoring position is an obvious problem, but what makes the Rebels worse when they
have runners on? Is it approach, pressure, bad luck, or something else? Usually a number that differs
so much from expected runs is unsustainable, and in Ole Miss’ case this would be a sign of future
improvement. If only the SEC played 162 games. What makes this more interesting is that Ole Miss
leads the league in “outs in play” to strike out ratio (OIP:K). Ole Miss is well above the league average
(2.6:1) with 3.7 outs occurring in the field to every 1 strikeout (3.7:1). They are also about average with
32% of balls in play falling for hits. It would seem logical that a team with the highest ability to put balls
in play would be good at moving base runners. For whatever reason, this hasn’t been the case.
Individual Player Production
I also looked at expected runs for individual player production using the extrapolated runs formula that
does not account for unearned runs. The projected runs each player is accountable for might be hard to
understand at first because the projected number does not reflect any basic offensive stat unlike team
extrapolated runs predict actual runs. It is most easily explained as a prediction of that player’s runs and
RBI’s averaged together. This is due to the value of each run or RBI being shared between two players.
The math here is kind of spotty so I won’t elaborate on it much. It is a good measure of how valuable
each player is to the team however. I also think it is worth noting that this test shows Alex Yarbrough to
be the most under producing player on the team. This just means that he hasn’t been scoring or driving
in the amount of runs that he theoretically should be. This is not necessarily his fault, but being nearly 8
runs under his expected production in SEC play is a surprising stat.
Adjusting Wins
What originally lead me to find expected runs was Pythagorean expectation. This is a simple formula
that uses games played, runs, and runs allowed to accurately estimate a team’s record. It can usually
predict an MLB team’s record over a 162 games to within 1 or 2 games. The main purpose of this model
is to determine if a team is “lucky” or “unlucky.” However, I wanted to use it for fun to see where Ole
Miss could theoretically stand. Ole Miss currently ranks 8th in expected wins, but when expected runs
are included they move all the way up to 4th.
Note: Stats were compiled before the Vandy game last night.
Over the course of SEC play it has become increasingly evident that Ole Miss has trouble driving in
runs. What makes this especially interesting is that Ole Miss ranks near the top of almost all offensive
statistics outside of run production. The Rebels have been so bad with runners on base that it almost
seems like an anomaly. This is probably not news to any of you. I wanted to take it a step further,
however. I sat down the other night with the goal of using sabermetrics to measure how much the
Rebel offense was underachieving in numerical values. The results are interesting and very telling.
Model
If you’re not familiar with sabermetrics, it was the underlying protagonist of Moneyball. It is basically
a form of baseball related math that nerds created to establish a niche in athletics and help general
managers accurately measure and predict production. In this case I referred to multiple models that use
basic offensive stats to predict how many runs a team or player should account for. I won’t go into all of
the math here to keep this from sounding like a math lecture, but I’ll make a forum post explaining the
math for anyone interested. The model I focused on primarily was Furtado’s extrapolated runs formula
with an added equation that accounts for unearned runs. It’s a linear model that uses the following
stats: at-bats, hits, total bases, walks, hit by pitch, grounded into double play, sacrifice flies, sacrifice
hits, stolen bases, caught stealing, strike outs, singles, doubles, triples, home runs, league fielding
percentage, and outs in play.
Results
This model proved to be the most accurate. When run on the combined SEC stats (excluding Ole Miss)
it predicted runs scored within 0.12%. It showed 1372.31 theoretical runs to 1374 actual runs. When
applied to individual SEC teams this is how far off their actual runs were from their “expected” runs
(conference play only):
6 teams within 5% of expected runs.
3 teams between 5% and 7% of expected runs.
2 teams between 7% and 10% of expected runs.
1 team with a 15.94% difference in expected runs and actual runs.
Can you guess who the last team is? It’s Ole Miss, and they aren’t 15.94% in the right direction. What
this means is that the Rebels have only scored 84% of the runs they should have theoretically scored in
SEC play this year based on their offensive numbers. They have only scored 120 runs in SEC play while
sabermetrics show they should have scored closer to 140.78 runs. This is over three times the average
difference among all other SEC teams: 4.64%.
Reason
The hardest part of the analysis is understanding why this is (especially without split stats). Batting
with runners in scoring position is an obvious problem, but what makes the Rebels worse when they
have runners on? Is it approach, pressure, bad luck, or something else? Usually a number that differs
so much from expected runs is unsustainable, and in Ole Miss’ case this would be a sign of future
improvement. If only the SEC played 162 games. What makes this more interesting is that Ole Miss
leads the league in “outs in play” to strike out ratio (OIP:K). Ole Miss is well above the league average
(2.6:1) with 3.7 outs occurring in the field to every 1 strikeout (3.7:1). They are also about average with
32% of balls in play falling for hits. It would seem logical that a team with the highest ability to put balls
in play would be good at moving base runners. For whatever reason, this hasn’t been the case.
Individual Player Production
I also looked at expected runs for individual player production using the extrapolated runs formula that
does not account for unearned runs. The projected runs each player is accountable for might be hard to
understand at first because the projected number does not reflect any basic offensive stat unlike team
extrapolated runs predict actual runs. It is most easily explained as a prediction of that player’s runs and
RBI’s averaged together. This is due to the value of each run or RBI being shared between two players.
The math here is kind of spotty so I won’t elaborate on it much. It is a good measure of how valuable
each player is to the team however. I also think it is worth noting that this test shows Alex Yarbrough to
be the most under producing player on the team. This just means that he hasn’t been scoring or driving
in the amount of runs that he theoretically should be. This is not necessarily his fault, but being nearly 8
runs under his expected production in SEC play is a surprising stat.
Adjusting Wins
What originally lead me to find expected runs was Pythagorean expectation. This is a simple formula
that uses games played, runs, and runs allowed to accurately estimate a team’s record. It can usually
predict an MLB team’s record over a 162 games to within 1 or 2 games. The main purpose of this model
is to determine if a team is “lucky” or “unlucky.” However, I wanted to use it for fun to see where Ole
Miss could theoretically stand. Ole Miss currently ranks 8th in expected wins, but when expected runs
are included they move all the way up to 4th.