Your last minute hockey fix before the season starts

I recently built a projection model for the 2021 NHL season. A high-level overview of this model can be found here, but the gist of it is that I used regression to determine how good each NHL player was at each thing they do and how much they would do each thing for their team, aggregated these values on the team-level to determine how well each team would do these things, and then simulated the season 10,000 times to determine the most likely outcomes for the season. …

What you need to know about the newest game simulation model from TopDownHockey

Image for post
Image for post
A Forecast for the 2021 San Jose Sharks. (Image by Author)

I simulated the 2021 NHL season 10,000 times in order to determine the probability of each outcome. I’ve began sharing the results of my work on Twitter, and I plan to write a full season preview soon, but before I do so it’s essential that I provide an overview of what I did, so that readers can analyze the process and determine what they believe to be the strengths and weaknesses of the model.

I began by using extreme gradient boosting to build an expected goal model that determines the probability of each shot becoming a goal. More about this process can be found here. (Note: When I use shot here and for the remainder of this article, I am referring to all unblocked shots, including those that miss the net.) I then used a prior-informed ridge regression to obtain a point estimate of the impact that each skater has on the rate at which their team generates and allows shots and expected goals for and against at even strength, shots and expected goals for on the power play, and shots and expected goals against on the penalty kill. In order to obtain a point estimate of the impact that each skater has on the probability of their own shots becoming goals, and that each goaltender has on the probability of shots they face becoming goals, I followed a very similar process but instead used a non-prior informed (vanilla) ridge regression. …

Moving away from the black box…

As I discussed in my Wins Above Replacement (WAR) write-up, I’ve used regression to obtain point estimates of an NHL player’s individual impact on the following six components:

  • Even Strength Offense
  • Even Strength Defense
  • Power Play Offense
  • Penalty Kill Defense
  • On-Ice Penalty Differential
  • Individual Shooting

The regression isolates a player’s impact by accounting for various external factors that surround them. These factors differ depending on the component which I am evaluating. For even strength offense and defense, I account for the following components:

  • All teammates and opponents.
  • Whether a shift started on-the-fly as the result of an expired power play. This is by far the most important piece of external context that can shape a player’s result for a given shift. Ignoring this context is extremely unfair to penalty killers like Esa Lindell who start a large percentage of their shifts as the result of expiring enemy power plays, where “power play influence” is still present. …

What you need to know about the newest WAR model from TopDownHockey

I’ve built a Wins Above Replacement (WAR) model that provides a point estimate of the value that a player has added in a given season. I strictly use this terminology whenever possible because this is only a point estimate gathered from the best of my imperfect ability, and because the amount of value that a player has added is not always the same as how good they are. (In some cases, the two may be vastly different.)

This article is meant to serve as a high-level overview of the meaning of the metrics, the process, and the results.

The model is built on six…

It Doesn’t Have to be; it Just is.

Alan Ryder broke ground in 2004 when he published hockey’s first expected goal model titled “Shot Quality,” but it wasn’t until Dawson Sprigings and Asmae Toumi published their expected goal model in October of 2015 that expected goals ascended into hockey popularity. Today, expected goals have usurped Corsi (shot attempts) as the go-to underlying metric for analyzing teams and skaters, and NHL arenas have even featured expected goal data on the jumbotron at intermissions. …

A False Dichotomy

If you’ve ever watched a hockey game, you’ve got an expected goal model built into your brain. Every shot you see, you calculate an expected goal value for. Unlike mathematically computed models, your model’s output doesn’t number between one and zero; it’s varying degrees of excitement when your team shoots and terror when their opponent shoots. You don’t calculate a precise goal probability each time a shot is taken, but you could give a solid estimate if you had to.

Your brain’s model is simultaneously tested and trained every time you watch a game. When a puck is shot, you “test” your model by assigning a goal probability to the shot in question. …

The Results: How big a deal is scorekeeper bias?

After writing over 7,000 total words for parts one and two of this article, I’m excited to finally share the results. If you’ve been reading along so far, I’m sure you are too, and I’d like to thank you for taking the time to read my work. I hope you’ve enjoyed it.

If you haven’t been reading along, that’s okay. I do highly recommend that you read part one and part two before you read this; they aren’t short, but they will give you a good idea of why this is all important to me and why you should care about it too. But if you’re not interested in doing all of that reading, and you just want to see the results, I don’t blame you and I won’t stop you from reading this. …

The Process: How did I build all of this?

Image for post
Image for post
Photo by Clément H on Unsplash

In Part One of this article, I laid out the groundwork for a few key concepts:

  1. Evolving Hockey’s highly effective goals above replacement model concluded that the Minnesota Wild had the NHL’s best skaters and worst goaltenders in the 2019–2020 regular season.
  2. The scorekeepers at Xcel Energy Center — the arena where the Minnesota Wild play their home games — exhibit a pattern of erroneously reporting that shots are taken further from the net than they actually are, which I refer to as “scorekeeper bias.”
  3. If shots (defined as shots on goal and missed shots) are reported further from the net than they are taken, this will lead this aforementioned goals above replacement model to overrate the defensive performance of Minnesota’s skaters at the cost of their goaltenders, with no penalty to the offensive performance of their skaters. …

The Origin: What motivated me to build all of this?

According to, the Minnesota Wild’s skaters led the NHL in goals above replacement (GAR) in the 2019–2020 regular season. In other words, their calculations, which isolate team performance by adjusting for contextual factors such as strength of schedule and back-to-backs, and separate the performance of a team’s goaltenders from their skaters, led to the conclusion that if you replaced every single skater on the Minnesota Wild with a replacement level player (an easily replaced player that falls below the 13th forward, 7th defenseman, or 2nd goaltender on their team’s chart), then their goal differential would drop by a larger number than it would if you did the same thing to every other team in the NHL. …

Use Knowi to connect to an Elasticsearch datasource, query your data with search-based analytics, and create visualizations.

Image for post
Image for post
Photo by Andres Siimon on Unsplash

Table of Contents


Elasticsearch is a scalable full-text search engine with an HTTP web interface and schema-based JSON documents. Elasticsearch shines brightest when it is used in the background as the fundamental engine powering applications with convoluted search features and many requirements.

At it stands, Kibana’s position in Elastic’s popular ELK stack makes it the most common tool used for the purpose of visualizing and analyzing data from Elasticsearch. …


Patrick Bacon

Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store