• Hello Guest. We are upgrading the server's XenForo versions. This has to be done in stages. 1.5.18 ---> 1.5.23 ---> 2.0.0 ---> 2.0.12 ---> 2.1.0. I will likely upgrade each one and leave it for a day or so to see if there's any kinks. When moving from 1.x to 2.x, our add-ons will cease to work. Much of the functionality of said add-ons is now native within XenForo 2.x, so I don't think we'll miss much if anything. IF we are, we can try out new 2.x XF add-ons.

OOTP Statistical Analysis/Nerding Thread

Travis7401

Douglass Tagg
Community Liaison
#1
Below I've pasted an image showing a slightly incomplete WAR analysis for the 2048 WBL. Overlayed on top of the data are data bands representing the Mean (Roughly 1.5 WAR) and bands representing 3 standard deviations in either direction from the mean. For those of you familiar with how the 20-80 scale works, each 10 points away from 50 is supposed to represent one standard deviation away from the average player.

Typically we find in that a normal distribution fit is a pretty good representation of this type of sports data. In a normal distribution you will have 68% of the data points within the 40-60 range. About 95% of the data will lie between the 30 and 70 range and 98% will lie within the 20-80 bands.

In order to see how well the actual data fits with the normal distribution, you have to conduct what is called a frequency analysis by making a histogram based on these "bins" (I usually prefer bins of half standard deviations to be able to visualize a bit better) and then overlaying the ideal normal distribution curve over the top of those histograms. I had this done last night, but EXCEL crashed on me so for now you'll just have to visually estimate how much data lies between each set of bands.

In our case I think a normal distribution fits pretty well. We have 220 total entries and most of our data should be located between the 40 and 60 bands, and that definitely looks to be the case. The 40-50 "bin" looks to have more frequency than 50-60 "bin" to me.

Now 95% of our data is supposed to also be between the 30 and 70 bands. This is where you can see our distribution is obviously skewed as we have about 20-25 entries above the 70 band and only one entry below the 30 band. Still, we are relatively close as I'd say about 90% of the data lies between the 30 and 70 bands (20-25 points out of 220 are outside). Still a pretty good fit compared to a normal distribution.

A normal distribution would predict roughly 98% of the data being between the 20 and 80 bands. In our case, we only have one entry outside (Carrier) so 99.5% of the data fits between 20 and 80. Still a normal distribution is a good fit.

I'll do a more thorough frequency analysis later, but my conclusion is that a normal distribution is a good approximation for our player population. We appear to have a slight over-representation in the 40-50 and 70-80 range and slight under representation in the 20-30 range. This means we've got more slightly below average players in the league and more all star type players in the league than would be predicted by a normal distribution.

Still, you can see how the OOTP "Overall" ratings fail to describe this population. Based on performance which is pretty well described by a normal distribution, the vast majority of players should have "Overall" ratings in the 2-3 star range (roughly the 40-60 range) while about 90% of the players in our case should have ratings in the 1-4 star range (roughly 30-70). We should have slight over representation in the 4-5 star range and under representation in the 0-1 star range. If you look at OOTP overall ratings for our player population you'll find a ton of 0.5 and 1 star rated players that perform much close to the league average than their hamfisted fuckshit "overall" STARRZZZZ would suggest. As such, I would pretty much ignore the STARRZZZZ rating as a predictor of player performance. Your 1 star guys could very well be average players to even above average players.

So @OU11, this is what I'm tombout when I'm mad that the OOTP ratings don't reflect a normal distribution. I'm not trying to "force" the players into a distribution of my choosing, it is clear that the statistical data supports a normal distribution for our population pretty damn well. I'll Show you the OOTP "Overall" rating distribution as a comparison and you can see it looks nothing like the distribution of the actual WAR statistics, which is why I view OOTP's "Overall" rating as largely worthless (regardless of whether we display it as STARZZZ or 20-80 scale).

 

OU11

Pleighboi
Utopia Moderator
#2
Overall ratings will never be based on stats though. It just means we have a lot of guys playing above their ratings.
 

Travis7401

Douglass Tagg
Community Liaison
#7
Overall ratings will never be based on stats though. It just means we have a lot of guys playing above their ratings.
That makes for an interesting discussion that is perfect for this thread. Ratings are inherently subjective (the game does a great job of capturing this btw) and Statistics inherently have a lot of variation even over a season long sample size So which one should matter to us as GMs? The answer is both.

In real life sports, scouted "ratings" for players tend to be based on either measurables, subjective opinion of the scout, or statistical analysis of player performance. This means the scout's "ratings" for a player already include many elements based on statistical performance. The measurables are the most objective, but often have the least value (how does sprint speed on a track translate to player performance on the field?). The subjective scout opinions basically depend entirely on how good the scout is at evaluating talent (and most aren't as good as they think they are). And the statistical based ratings are inherently based on too small sample sizes and therefore subject to a fair amount of variability (plus player skills change through career so you are looking at trying to hit a moving target with a forever too small sample size).

In OOTP there are some hidden actual ratings behind the scenes that we can't see. We can see the OSA and Scout OPINIONS of what the ratings for that player are... and we can see what the actual player performance shows. Neither gives us a complete picture of the objective value of our player because we know the OSA/Scouts can be wrong and we know statistics have a lot of variation even over a season. I think this realistic incomplete information aspect of games like OOTP and Football Manager is what separates these games from the Trog sports games EA produces where any statistical variation is based on glitches and poor sim technology. In OOTP/Football Manager, If there is disagreement in the ratings and statistics you have to ask yourself if the ratings might be incorrect or if this is just a statistical anomaly where the player is currently under/over performing. It could even be both.

On thing I really want to explore more in this thread is comparison of statistics and player ratings at the same "snapshot in time" through the course of several years to see what sort of conclusions we can come up with. I've got access to the 2047 and 2048 statistics at the moment, but I can't easily compare them to the player ratings at those snapshots in time because the player ratings have changed since then. My first complete analysis will be based on the 2049 Player stats vs the 2049 and 2050 OSA ratings (to provide a snapshot at start/end of season).
 

osick87

Well-Known Member
Community Liaison
#8
Hmm I wonder if there is an ability to "Mod" your own stats into OOTP So I don't have to make an external spreadsheet.
 

OU11

Pleighboi
Utopia Moderator
#10
That makes for an interesting discussion that is perfect for this thread. Ratings are inherently subjective (the game does a great job of capturing this btw) and Statistics inherently have a lot of variation even over a season long sample size So which one should matter to us as GMs? The answer is both.

In real life sports, scouted "ratings" for players tend to be based on either measurables, subjective opinion of the scout, or statistical analysis of player performance. This means the scout's "ratings" for a player already include many elements based on statistical performance. The measurables are the most objective, but often have the least value (how does sprint speed on a track translate to player performance on the field?). The subjective scout opinions basically depend entirely on how good the scout is at evaluating talent (and most aren't as good as they think they are). And the statistical based ratings are inherently based on too small sample sizes and therefore subject to a fair amount of variability (plus player skills change through career so you are looking at trying to hit a moving target with a forever too small sample size).

In OOTP there are some hidden actual ratings behind the scenes that we can't see. We can see the OSA and Scout OPINIONS of what the ratings for that player are... and we can see what the actual player performance shows. Neither gives us a complete picture of the objective value of our player because we know the OSA/Scouts can be wrong and we know statistics have a lot of variation even over a season. I think this realistic incomplete information aspect of games like OOTP and Football Manager is what separates these games from the Trog sports games EA produces where any statistical variation is based on glitches and poor sim technology. In OOTP/Football Manager, If there is disagreement in the ratings and statistics you have to ask yourself if the ratings might be incorrect or if this is just a statistical anomaly where the player is currently under/over performing. It could even be both.

On thing I really want to explore more in this thread is comparison of statistics and player ratings at the same "snapshot in time" through the course of several years to see what sort of conclusions we can come up with. I've got access to the 2047 and 2048 statistics at the moment, but I can't easily compare them to the player ratings at those snapshots in time because the player ratings have changed since then. My first complete analysis will be based on the 2049 Player stats vs the 2049 and 2050 OSA ratings (to provide a snapshot at start/end of season).
Players have past scouting reports you can reference. It would be more stagnant for 25+ year olds since they rarely improve
 

Travis7401

Douglass Tagg
Community Liaison
#11
Players have past scouting reports you can reference. It would be more stagnant for 25+ year olds since they rarely improve
Yeah, I can do this on an individual basis, but there doesn't appear to be an easy way to do it league wide in one table like you can for the current ratings. I just last night figured out how I can change splits to show just 2048 and 2047 ratings for the whole league table. I'm sure there might be a way, but we are moving fast enough that I'll have my snapshots at the beginning/end of one season soon enough. I can already compare the 2048 stats to the 2049 OSA ratings if I get excited.
 

OU11

Pleighboi
Utopia Moderator
#12
Also that's why I want the stars to just be how good they are at a certain position. I need to know how they figure out that single number, maybe there's another 12 year old post over there explaining that.

Still want you to break down the curve based on position, I want to see if my idea that certain positions are skewed more than others is correct.
 

OU11

Pleighboi
Utopia Moderator
#13
So @OU11, this is what I'm tombout when I'm mad that the OOTP ratings don't reflect a normal distribution. I'm not trying to "force" the players into a distribution of my choosing, it is clear that the statistical data supports a normal distribution for our population pretty damn well. I'll Show you the OOTP "Overall" rating distribution as a comparison and you can see it looks nothing like the distribution of the actual WAR statistics, which is why I view OOTP's "Overall" rating as largely worthless (regardless of whether we display it as STARZZZ or 20-80 scale).
As far as this goes, there are so many variables but one really big part that will skew is you'll have all the rookies that are 3*+ playing at a very low level. I think the overall rating is useful in terms of "this is how good he could be" rather than "this is how he is going to play". I don't care if how good they "could" be is normally distributed, since the game will automatically bell curve the stats somewhat. The league has totals that it conforms the league to, it isn't an absolute deal but generally the totals are very close to what OOTP says they should be.

There is a setting I can turn on where it creates artificial news stories like "mound has been lowered to allow more hitting" where these numbers change to change the environment.
 

OU11

Pleighboi
Utopia Moderator
#14
So I can do this... it changes the overall rating to AI evaluation which uses these 4 settings.

Dani Gut is a 4/5 right now and overall.

With those AI settings he is a 1/5 based on position and 2/5 overall

Each are more indicative of his production so far this year.
I'll add these in here, I think this would bell curve the stars more. Especially if we tinkered with the weights.
 

Travis7401

Douglass Tagg
Community Liaison
#16
Okay, so after evaluating the data some more, I wanted to get rid of a lot of players in the 0-0.5 range who simply didn't play enough to have a meaningful WAR value. I filtered the OOTP data for 2048 by qualifying number of plate appearances and came up with 138 batters in my data set. So we know these guys are typically starters and should therefore be better on average than the players who were WBL but didn't meet the qualifying # of plate appearances to generate meaningful statistics. We can discuss whether we want to exclude/include later, but I want to get this posted so you people (@OU11) can see what I mean about the distribution.

The orange line is a normal distribution as applied to a 20-80 scale. Each 10 points away from center = 1 standard Dev. That's the beauty of this system is that it is not merely a number, it describes an entire distribution and the relative place with respect to that distribution and is therefore much more valuable than the typical mong scales people use for misogyny and such (SHES A DIMMEEEEE). But I digress

The orange is a normal distribution. The blue bar charts are the cumulative frequency distribution for 2048 WAR for all qualifying batters. As you can see, it isn't a perfect fit, but the general data trend is that the normal distribution adequately describes the data. The green bar charts are cumulative frequency distributions for the 2049 OOTP OSA "Overall" ratings. You can see clearly that these are not anywhere close to being normally distributed. Whether you display them as stars or 20-80 or 0-100 or 7-11 dongs doesn't really matter. On a different scale you'd end up with fewer/more bins in different locations, which would make comparison a little more difficult, but as you can see they do not resemble anything close to a normal distribution. They are probably closest to a uniform distribution, but they are pretty fuckshit regardless.

 

Travis7401

Douglass Tagg
Community Liaison
#17
And since people were curious, here is the 2048 WAR (among all 138 qualifying batters) with each position labeled as a separate color. While there are definitely some positions that trend toward the top or bottom of the curve, the distribution of WAR is a lot more even than I think a lot of you would have expected.



btw, can we start testing Carrier for PEDs?
 

Gooksta

Well-Known Member
#20
Okay, so after evaluating the data some more, I wanted to get rid of a lot of players in the 0-0.5 range who simply didn't play enough to have a meaningful WAR value. I filtered the OOTP data for 2048 by qualifying number of plate appearances and came up with 138 batters in my data set. So we know these guys are typically starters and should therefore be better on average than the players who were WBL but didn't meet the qualifying # of plate appearances to generate meaningful statistics. We can discuss whether we want to exclude/include later, but I want to get this posted so you people (@OU11) can see what I mean about the distribution.

The orange line is a normal distribution as applied to a 20-80 scale. Each 10 points away from center = 1 standard Dev. That's the beauty of this system is that it is not merely a number, it describes an entire distribution and the relative place with respect to that distribution and is therefore much more valuable than the typical mong scales people use for misogyny and such (SHES A DIMMEEEEE). But I digress

The orange is a normal distribution. The blue bar charts are the cumulative frequency distribution for 2048 WAR for all qualifying batters. As you can see, it isn't a perfect fit, but the general data trend is that the normal distribution adequately describes the data. The green bar charts are cumulative frequency distributions for the 2049 OOTP OSA "Overall" ratings. You can see clearly that these are not anywhere close to being normally distributed. Whether you display them as stars or 20-80 or 0-100 or 7-11 dongs doesn't really matter. On a different scale you'd end up with fewer/more bins in different locations, which would make comparison a little more difficult, but as you can see they do not resemble anything close to a normal distribution. They are probably closest to a uniform distribution, but they are pretty fuckshit regardless.

There are too many factors to be any consistency imo... Like injuries and such
 

Travis7401

Douglass Tagg
Community Liaison
#21
Take out every position except one already. Just catchers for example.


Catchers range between 6.5 WAR and -0.7 WAR and the distribution follows the curve pretty well. Every position will have gaps due to sample size compared to the entire sample population. Where the gaps lie might tell you something or they might not.
 

OU11

Pleighboi
Utopia Moderator
#22
When i get home im going to put 100%on last years stats and see if the AI determines the same that you do

That's the only way i know how to test your numbers since they use a completely different scale
 

Orlando

Well-Known Member
Utopia Moderator
#24
There are too many variables. All the scouts are different, vs OSA. The ratings by these groups change regularly. The rating changes based on position assigned as well.
 

Orlando

Well-Known Member
Utopia Moderator
#26
Not to mention a lot players play out of position. I always wondered if that affected WAR. I assume the game is coded to base it off of which position the innings were played at, but that's not true for All Stars so who knows. A guy could play all year as a CF but since they are assigned as a RF they might not even be on the ballot.
 

Travis7401

Douglass Tagg
Community Liaison
#27
There is a Google Spreadsheet with all the data I'm usable. It is based on the "Sortable Stats" and filtered for batters only, qualifying plate appearances only, 2048 stats split, and OSA ratings. It is basically a combination of the Default view, the Batting stats 2 view, the Batting Ratings view, the Fielding Ratings view, and the Positional Ratings view.



Obviously that's ham-fisted so you can copy/paste as necessary.
 

Travis7401

Douglass Tagg
Community Liaison
#28
There are too many variables. All the scouts are different, vs OSA. The ratings by these groups change regularly. The rating changes based on position assigned as well.
This is why I'm saying I ignore them. They start out based on worthless math and then they are also constantly changing.

I'll do the Travis BTT scouting rating at some point and post it in this thread or anyone who is interested, with a comparison to the "Overall" rating of a player as determined by OSA (since that's the one thing we all have access to).
 

Gooksta

Well-Known Member
#30
Not to mention a lot players play out of position. I always wondered if that affected WAR. I assume the game is coded to base it off of which position the innings were played at, but that's not true for All Stars so who knows. A guy could play all year as a CF but since they are assigned as a RF they might not even be on the ballot.
Also some guys will play better because they are on better teams.. or worse because they are on bad teams.. too many variables
 

Travis7401

Douglass Tagg
Community Liaison
#31
Baseball is among the most predictable sports when it comes to ratings vs statistics. Here is why the OSA OVR rating blows. R2 of 0.36 between a player's Overall Rating and their WAR. I could jerk off and spray cum around my office and the stains would look more correlated than this.

 

Travis7401

Douglass Tagg
Community Liaison
#32
Just wait until I blow ur minds with the BTT rating and how much better it is at predicting. Then @Gooksta can take his "ohhh too many variables" and "ohhh lineup protection" stuff and come at me.
 

Gooksta

Well-Known Member
#34
I am more interested in player types and roster construction themes.. Comparing stats to "overall" ratings, I guess I'm not seeing the point?!


Like @OU11 stacking up on power bats. @doh putting importance on defense..
 

Travis7401

Douglass Tagg
Community Liaison
#38
I wow people with pivot charts and then they leave me alone. In the middle of that stat ham-fisting this afternoon I also coded in and analysed a proposed bridge and then presented said data to ma'boss to make him happy before the weekend. God like mind, B.
 

Travis7401

Douglass Tagg
Community Liaison
#43
What are some batting stats you guys really care about when you evaluate players? Fielding stats as well. And then how about how you typically evaluate ur players "overall"
 

Travis7401

Douglass Tagg
Community Liaison
#47
In addition to stats you guys might be interested in, feel free to post any hypothesis you want. ie "Contact is the most important batting stat for a good average." I know we have all made some general assumptions about how the individual ratings are supposed to correlate to the statistics, but I'm interested in looking into this more with pivot tables.
 

Travis7401

Douglass Tagg
Community Liaison
#49
You should see if we are worse defensively than the mlb. Doh swears we are
The generally high average WAR values lead me to believe the batters are more prolific than the MLB. Basically we have too many "wins" in the system, if that makes sense. From what I understand about the statistic itself, it is really based on runs and then applied on the assumption that 10 "runs" = win. In our league that might not be true, but it should be easy enough to figure out with a full year of data.