A New Expected Goal Model That is Better Than Corsi at Predicting Future Goals

It Doesn’t Have to be; it Just is.

Alan Ryder broke ground in 2004 when he published hockey’s first expected goal model titled “Shot Quality,” but it wasn’t until Dawson Sprigings and Asmae Toumi published their expected goal model in October of 2015 that expected goals ascended into hockey popularity. Today, expected goals have usurped Corsi (shot attempts) as the go-to underlying metric for analyzing teams and skaters, and NHL arenas have even featured expected goal data on the jumbotron at intermissions. I recently published an article describing the false dichotomy between expected goal models and subjective shot quality analysis, the advantages of mathematically computed models, and the need for both.

One of the key benchmarks of an expected goal model is its predictive power. In fact, the predictive power of expected goals in general has recently been called into question in an article by an analyst known as DragLikePull. He compared the correlation between 5-on-5 score-adjusted Corsi/expected goal shares in the first half of the season and 5-on-5 goal shares in the second half of the season and found that Corsi was better overall at predicting second-half goals than expected goals were. Based on these findings, he concluded that fans and analysts should discard expected goals at the team level and return to using shot attempts.

My research has led me to different conclusions on the predictive power of expected goals, but before I get into that, I want to address my issue with this line of thinking. A part of me wishes that we had stuck to calling expected goal models “Shot Quality” models instead, because I think that the term “Expected Goals” implies that these models are solely predictive in nature, which isn’t necessarily the case. Even if expected goal shares were completely useless for predicting future goals at the team levels, expected goals would still be extremely useful for describing past events and telling us which teams relied heavily on goaltending and shooting prowess, or were weighed down by poor shooting and goaltending, and even which shots the goaltender deserved most of the blame for, so I disagree with the premise that hockey fans should stop using expected goals at the team level if they are not as predictive as Corsi.

Fortunately, I don’t need to fight tooth-and-nail to support my case here, because I have found an expected goal model that is more predictive of future goals than Corsi is: My own. I built an expected goal model that I will use as the driving engine behind a WAR model that I have built and will be releasing shortly. Before I get into the results of my tests, I will give a brief overview of how I built my model and the variables that I chose to account for. If you’re not interested in this and just want to see the comparison between expected goals and Corsi, simply skip to the test results.

I trained my model using extreme gradient boosting, a hyper-efficient machine learning technique commonly used for regression and binary classification problems. In other words, I showed my computer a bunch of shots, told it which of them were goals, and then used extremely powerful software to teach it to predict the outcome of new shots. I accounted for the following variables in my model:

  • Shot distance and shot angle. (The two most important variables.)
  • Shot type.
  • The type of event which occurred most recently, the location and distance of this event, how recently it occurred, which team the perpetrator was, and the speed at which distance changed since this event. (The inclusion of the last variable was inspired by Peter Tanner of Moneypuck.)
  • Whether the shooting team is at home.
  • Contextual variables such as the score, period, and seconds played in the game at the time the shot was taken.
  • Whether the shooter is shooting on their off-wing. (For example, a right-handed shooter shooting the puck from the left circle is shooting from the off-wing, and a left-handed shooter shooting from the same location is not.)

Additionally, I chose to make an adjustment for scorekeeper bias; an issue I’ve discussed in much greater depth in the past. The adjustment was quite rudimentary: for the past 3 seasons, I subtracted the average shot distance (by both teams) in all a team’s away games from the average shot distance in all their home games. The resulting value from this calculation was a scorekeeper bias adjustment factor which I subtracted from the reported distance of all shots; the difference from this calculation was “adjusted distance” and I used this for my model in place of reported shot distance. To give a quick example of this:

The average reported distance of shots taken in games at Xcel Energy over the past 3 seasons was 2.54 feet further from the net than the average reported distance of shots taken in games where the Minnesota Wild were the away team. This gives an adjustment number of 2.54 feet. If the reported distance of a shot taken Xcel Energy is 30 feet from the net, I subtract the adjustment number of 2.54 feet from the reported distance, and obtain an “adjusted distance” value of 27.46 feet from the net, which I use as the input for my model.

Most expected goal models are trained on a minimum of five full seasons of data to train and then “tested” on an out-of-sample season. I tried building a model using this approach, but I kept running into a major issue: My expected goal values never added up to actual goals. The total number of expected goals I calculated for a given season were typically somewhere between 100 and 300 below the number of actual goals scored in those seasons.

While the collective body of NHL shooters may perform better than the collective body of goaltenders over a given season, a massive discrepancy between the two of them persisting over multiple seasons suggests that expected goal values are too low. Indeed, my values were, and the reason for this is that the NHL shrunk reduced maximum goaltender pant sizes prior to the 2017–2018 season and reduced maximum pad sizes prior to the 2018–2019 season. These changes play a big role in the new high-scoring environment we’ve grown accustomed to over the past three seasons, and using an expected goal model built on old data prior to these changes holds goaltenders to an unfairly high standard and shooters to an unfairly low standard by underestimating goal probability.

In order to combat this issue, I chose to train my model on only the past three seasons. I began the training process using cross validation on the 2017–2018 through 2019–2020 seasons, testing the model on cross-validated samples using different parameters each time with the goal of finding parameters that would maximize area under the curve. The modeling technique which I used to train my model was similar to and inspired by the technique used by Evolvingwild in their expected goal model; the write-up for which was my introduction to the concept of extreme gradient boosting.

Unfortunately, I couldn’t just train the final model on every shot from the past 3 seasons and then test it at once, as this would lead to overfitting. (To put this more simply, if I showed my computer the shots that I was trying to test it on, it would become “too smart” and “cheat” in predicting goal probability by considering results it isn’t supposed to know.) In order to avoid overfitting, I removed 100 game samples from my training sample, trained my model on the other 3,524 games, and then “tested” my model on the 100 games in question, and repeated this process until I had tested my model on every game and saved the results. In total, this technically means that I built 74 different models for the past two seasons: 37 for even strength and 37 for the power play. But because each model was trained on almost all the same data, and each model used the exact same parameters and variables, it’s easier and still mostly accurate to say that I just built two models: one for even strength, and one for the power play.

I also built models for the 2013–2014 through 2016–2017 regular seasons using the same training and testing parameters as I used for the 2017–2018 through 2019–2020 seasons. In theory, I likely could have improved my results for these seasons by repeating the above process of cross validation for these seasons and acquiring different parameters, as these are different seasons with a larger total sample, but the main seasons that I care about are the past three, and any improvement from acquiring different parameters would be rather minimal, so I stuck with the same parameters. For these seasons, the gap between expected goals and actual goals was not a problem, so I just used all data from 2010–2011 through 2016–2017 and removed the test season from my training sample, building eight different models for the four seasons I cared about; four for even strength and four for the power play. I implemented the same scorekeeper bias adjustment for each of these seasons, but I used data from the three seasons prior to the test season in order to calculate a scorekeeper bias adjustment. I prefer this method intuitively and would have done this for the past three seasons as well, but scorekeeper bias has become a much smaller and more consistent (though still significant and problematic) issue in the past 3 seasons, and using older data would provide me with exaggerated adjustment values.

I spent some time working on an expected goal model for penalty shots and shootout shots, but I was remarked at how poorly they performed in testing. After some consideration, I decided to build the most rudimentary model possible for these shots, by assigning them all the same expected goal value of 0.31 goals, which reflects the percentage of shootout and penalty shot goals that became goals over the past three seasons. I am comfortable doing this because unlike all shots at other situations where external variables shape the opportunity available to shooters to score and goaltenders to save, variables such as the location, angle, and shot type in a shootout are all influenced almost exclusively by the shooter and the goaltender. (I didn’t bother to build an expected goal model for these shots prior to this season as they won’t factor into my analysis.)

I also worked on building expected goal models for shorthanded shots and shots taken with at least one goaltender missing, but I ultimately chose not to work extensively on these models or release the results because the sample size for these shots is quite small, and more importantly because my goal for this expected goal model was to empower a WAR model, and I did not plan to incorporate shorthanded offense or any plays created with an empty net into my model.

Now, for the Results:

I chose to test the results of my model using three target metrics. The first was area under curve, a metric which I also used as the target metric for the cross validation process; you can read more about this metric here. The second target metric was that expected goals would roughly equal actual goals over the aggregated sample size of the past three seasons. Here are my results for the past three seasons and the four before them as well:

As you can see, the models built before these three recent seasons performed a bit better in terms of AUC which makes sense; a larger training sample generally leads to a stronger test performance. I could theoretically have improved AUC by training my model on more than three seasons but doing so would mean significantly fewer expected goals than actual goals over the past three seasons, which I was not okay with; I was very happy that the total difference between expected goals and actual goals was as only eight over the past 3 seasons and not comfortable with sacrificing this precision. Additionally, the AUC values for the past few seasons are still good, and roughly on par with or superior to AUC values that I’ve seen reported from other expected goal models, so I’m happy with where this model is at. I would like to see a higher AUC on the power play, but that’s something I’m happy to set aside as a goal for future models.

I also tested how well expected goal shares could predict future goal shares. I largely used the same method that draglikepull outlined in their article, calculating 5-on-5 goal shares, Corsi shares, and expected goal shares in the first half of the season, and comparing them to actual goal shares in the second half of the season. But my method differed in two ways: I did not apply a score-adjustment to any of the data, opting to compare the raw metrics to one another, and I also calculated a separate expected goal share value with all rebounds removed. (I defined rebounds as shots where the prior event was a shot by the same team that occurred no more than two seconds ago.)

I used R² as the metric to compare the correlation in the first half to goals in the second half. Here were my results using data from every single team season from the 2014–2015 through 2018–2019:

Image by Author. (RR xGF% = Rebound-Removed xGF%.)

As you can see by comparing my results to draglikepull’s (shown below), applying a score-adjustment does not change the predictive power of goal shares, but significantly increases the predictive power of Corsi.

Image by Draglikepull

However, it follows logically that a score-adjustment would also improve the predictive power of expected goals, and my expected goal values without a score-adjustment perform significantly better than Corsi does with a score-adjustment, and they blow unadjusted Corsi out of the water, so I am comfortable saying that they currently have more predictive power; especially the expected goal values with rebounds removed. If enough folks are interested, I am not opposed to testing score-adjusted metrics in the future.

As draglikepull’s numbers show, my expected goal model is not the only one that has pulled ahead of Corsi in the last five years; Natural Stat Trick’s has done the same. This is partially because the predictive power of Corsi has declined and partially because the predictive power of expected goals has improved. I have a theory for why each of these respective changes have occurred.

Goodhart’s Law states that “When a measure becomes a target, it ceases to be a good measure.” Corsi gained ground as a popular measure that NHL front offices used to improve their team, and that NHL player agents began using to make the case for their clients in the early portion of the 2010s, right around the same time that Corsi’s predictive power became to decline. I would not say that we’re quite at the point where Corsi is no longer a good measure, but it has indisputably declined, and I believe that is because it’s become a target. I suspect that the predictive power of expected goals has improved because the quality of data provided by the NHL’s Real-Time Scoring System has improved.

After doing some research, I decided to do one more test using the same exact process that I previously used, but using random 41 games samples to predict the other random 41 games in place of the first and second 41 games of the season. In order to select 41 random games, I used R’s set.seed() function and set the seed at 123 for choosing random games so that my process could be repeatable. (I also have the data for each random split saved, and would be willing to share it with anybody who is interested.) Here were my results:

Image by Author

This method provided significantly different results, but one theme remained common: expected goals with rebound shots removed were by far the most predictive.

What does this mean? Going forward, should we only use expected goals with rebounds removed? No. Rebounds are real events that happened, and until the NHL decides that rebound goals no longer count, any descriptive metric of past events should include rebound shots. If you’re strictly looking to predict which team will be the best team in the future, it may be best to use a metric that excludes rebounds, but I don’t think that is how most people do or should actually use expected goal models. (I would also like to credit Peter Tanner of Moneypuck for bringing to my attention that expected goals with rebounds removed are more predictive.)

In the coming weeks, I plan to release expected goal data at the shooter, goaltender, and team level with and without rebounds for the past 7 seasons. Additionally, because I built this expected goal model in order to empower a WAR model that I have now completed, I plan to release the data that I’ve compiled in the process, which includes non-prior-informed RAPM for the past 7 years, prior-informed RAPM for the past 6 years, regressed shooting/goaltending components for the past 3 years, and total WAR for the past 3 years; all of which are built using this data from this expected goal model.

For the upcoming 2020–2021 NHL season, as soon as games begin being played, I plan to make expected goal data available to the public both with and without rebounds removed. The model that I use will be trained on the entirety of the past three seasons using the same parameters I used for this model. I will also make data available for shots taken shorthanded and with an empty net.