Data Input

Select date from which to forecast:


Select Corps

Select 2 Corps

Select Criteria

About the model


Welcome to Evan Murray's DCI Forecast for the 2019 season! It calculates each corps' pace of improvement and current rank, and then uses that to simulate DCI's Finals Week shows. You can see the results here.

The Current Forecast tab is the model itself. You can run it for any day in the season, and the model uses all scores up to and including that day.

The Corps Summary tab allows you to see all the scores for a corps in more detail. If they have enough data, you can also see the exponential fit to their data - aka thier pace of improvement. The plot includes the approximate 95% confidence interval for each caption, which gives you a sense for the model's confidence. To see how much the uncertainty can vary, compare Open Class error bars with World Class'.

The Corps Comparison tab compares two corps head-to-head. You can see how their scores compare now and during Finals Week (specifically Prelims, becuase all corps perform that day). The comparison will tell you which corps wins the head-to-head and compare them caption by caption.

The Model History tab compares up to 4 corps on the odds that they succeed in something - from making Semifinals to winning it all. This is another good way to compare corps - for example, looking at the odds of winning Gold for the top 3 or 4 corps as the season has progressed is pretty interesting.

Lastly, the How It Works tab contains more mathematical detail on how the model does it's thing. You can also check out the code on GitHub and check out my website for other content related to the model.

If there's anything else you'd like to see, let me know. The forecast will be updated every few days for the rest of the season, so check back in from time to time!

Enjoy,

- Evan

FAQ


Why aren't all corps included in the model?

For a corps to be included in the model, they need to have performed at least 6 times and the model needs to be able to fit an exponential to their scores for each caption. Corps will be added to the forecast as soon as they meet these two conditions, but in the meantime, the model proceeds as though the corps don't exist.

Some scores are different from DCI's official scores. Why?

Some early-mid season shows have incomplete judging panels. When that happens, sometimes one of the Visual or Music captions is worth 40 points and the other 20, as opposed to DCI's standard 40-30-30. In those cases, the model adjusts the scores to the standard 40-30-30 to keep things consistent. However, it's entirely possible there was an error or mistake in the data collection. If you think you've found an incorrect score, send me an email at evan.habsfan@gmail.com.

It seems like the model is overrating some corps. Why?

Open Class scores tend to get inflated once their tour breaks away from World Class in late July, which means the model will likely overrate Open Class corps somewhat in early August and going into Finals Week. It's also possible that, until there have been some regionals, some World Class corps may be overrated due to judges on one circuit scoring higher than others. Early in the season, there's generally a Midwest, West, and East circuit. Generally one of them is scored higher than the other two, but it varies which one from season to season.

What does forecasting from different days do?

The model uses two things to forecast Finals Week - each corps' pace of improvment and their current rank. Choosing the day on the Current Forecast tab sets the day the corps are ranked - the model uses its best guess for scores on that day. If you choose a day in the past, the model ignores all scores that came after that day, so you can see what the model thought a week ago (for example) versus today.

howitworks

Building the 2019 DCI Forecast

Since 2017, I have been maintaining a model that forecasts the DCI scores and results for Finals Week. Earlier in the 2019 season, I also posted a version of the model looking at historical seasons, stretching back to 1995. While the core of the model has been pretty constant through the years, there are always adjustments on the margins. Therefore, it makes sense to go over how the model works this year.

Overall, the model uses four steps:

  1. Use curve fitting to determine how good each corps is now, on average, and how they've been improving over the course of the season. This is done for each caption.
  2. Use these "skill curves" to predict how good each corps will be during Finals Week, on average.
  3. Figure out how each corps would place based on a combination of their current rank and natural variability in scores during Finals Week.
  4. Combine the results from steps 2 and 3 to produce the final forecast.

Fitting and Forecasting Skill Curves

Each corps has three skill curves, one for each caption. That's General Effect, Music, and Visual. This is based on fitting a curve of the form y = a + xb to their data, where x is the day of the competitive season and y is the actual score. Typically, a is close to a corps' first score of the season, and b is somewhere between 0.5 and 1. That means the curves tend to start linear but level off as the season goes on. As an example, let's look at early-season Santa Clara Vanguard's skill curves:

Skill Curve Example

This is a screengrab from the "Corps Summary" tab on July 3, 2019. In the plot above, the points are the scores of the shows themselves, and the lines are the skill curves fit to them. But the line doesn't pass perfectly through each point, because there's always some uncertainty in curve fitting. This is what the dashed lines represent - they are the 95% confidence interval for each caption. The wider they are, the more uncertainty there is in the skill curves. Because this is very early season, SCV's curves are pretty uncertain.

The curve fitting algorithm weights scores differently, for two reasons. The first is that early-season scores don't tend to be as predictive of Finals Week as those later in the season. To capture this, the "base weight" for scores increases through time. All things being equal, the model weighs scores from July 29 more heavily than July 1. The second thing the model does is underweight recent shows. This is because we want a model that is somewhat skeptical when a corps gets a suddenly high or low score. Historically, unexpected results like that tend to be outliers. Before the model overreacts, it waits to see a corps sustain their success over several shows. This is especially important in the Open Class predictions because their scores tend to be more volatile. Through time, the show's weights look somewhat like pyramid - the model starts with a low weight for the early-season scores which increase through time, but more recent shows are discounted as well.

There are two conditions which can cause a corps not to have skill curves. The first is just that the curve fitting didn't work - this will be pretty common early in the season when the model doesn't have much data to work with. The second is that corps can have too little data in the first place - the model excludes all corps with fewer than 7 shows. In either case, the model will proceed as though the corps doesn't exist at all. Generally speaking, this isn't a problem for World Class by mid-July, but Open Class can take a bit longer. Open Class predictions won't be live until at least 5 corps are in the model.

The model uses the skill curves to predict how good each corps will be during Finals Week. Because it also tracks the uncertainty in the curve fitting, it can do this forecast assuming there's measurement error. The more uncertain the curve fitting, the more the model hedges its prediction by using a wide score distribution. Using the distribution of a and b coefficients for each corps, the model predicts Finals Week 10,000 times, assuming each time is independent. The end result is that each corps has 10,000 skill-based score predictions.

Determining Corps Rank and Natural Variability

The "random" part of the model is based on corps rank and the natural variability in DCI scores. To understand why the model needs to rank corps, it's important to understand how it defines "natural variability".

Natural variability is the variation in how judges assign scores from show to show, as the show score rarely matches the score predicted by the skill curves. This variability does not mean judges are biased or political. In fact, the model doesn't consider individual judges in the forecast at all, and there is no evidence in the historical data of judges being collectively biased against individual corps. Rather, this variability is just based on the fact that sometimes judges score a little low or a little high. They're pretty good on average.

Unlike the uncertainty in the skill-based forecast, natural variability is not independent from corps to corps. Historically, judge error tends to be consistent from corps to corps at any given show. If the judges score Bluecoats higher than expected, there's a good chance they will do the same for Santa Clara Vanguard at the same show. This correlation is pretty strong, and stronger for corps that perform back-to-back than those that perform farther apart.

Because the correlated error depends on performance order, the model needs to guess the Finals Week performing order so that it can create the correct correlation matrices. It ranks them as they are now and assumes they'll slot the same during Finals Week. Using the most recent scores gives an unfair advantage to corps who have performed more recently, so the model uses the skill curves instead. This also removes individual show variability in scores from the rankings.

In 2017 and 2018, the model did these rankings on a per-caption basis. But performance order is determined by total score, so the 2019 model uses total score to rank the corps. Based on the rankings, the model once again predicts Finals Week 10,000 times, this time based on the gaps between each corps and their correlated random error. It's important to note that, while the rankings are based on total score, the forecast itself still takes place for individual captions.

Typically, the model assumes the mean error is 0 in these simulations, because the judges aren't biased against any corps. But there can be an exception. Later in the season, Open Class splits off into its own tour for a couple weeks before they join back up with World Class for Prelims. In this time, Open Class scores tend to get inflated, and then they drop back down on Prelims night. Historically, the average drop from Open Class finals to Prelims, just two days later, has been 2 points. Therefore, when the model is predicting the Indianapolis shows based on late-season data, it assumes the mean error for Open Class corps is -2, not 0, but keeps everything else the same.

Combining the Forecasts

As this point, each corps has 10,000 skill-based simulations of Finals Week and 10,000 simulations based on natural variability and rank. All the model does is combine them, by taking a weighted average of the scores. The average is weighted based on the historical magnitude of overall variance in Finals Week scores (somewhere between 2 and 2.5 points) and the percent of this variance that comes from skill versus natural variability. Overall, the natural variability is weighted about twice as much as the skill-based simulations.

In this forecasting and averaging process, the interpretability of the raw score predictions breaks down. The winning corps' score can vary from less than 90 to 100 points as the season progresses, but we know that's unrealistic. What the model maintains, though, it the gaps between each corps. This is why the predictions the model makes are never greater than 0 - it assigns 0 to the winner and calculates the gaps for everyone else.

In order to convert the 10,000 average simulations into percentages, the model just counts. For example, if Santa Clara Vanguard wins in 6,000 of the 10,000 simulations, the model gives them a 60% of winning. The model tracks odds of getting each medal, making finals, and making semifinals.


That's really about it. Do you have any other questions? Does it seem like something should be in this article that isn't? Do you have a problem with my methodology? Reach out! You can track me down on reddit as u/Overthink_DCI_Scores, on Github as EMurray16, and via email at evan.habsfan@gmail.com.

- Evan