Category Archives: Uncertainty Quantification

Lies, Damn Lies, and Next Gen Stats

This commercial showing an exciting play where Christian McCaffery scores a touchdown has been out for some time now. In it McCaffery discusses how it was improbable (14.2% of the time or about 1/7) that a touchdown would be scored with four defenders at a given distance away (3.0, 3.3, 7.5, and 8.3 yards away). Due to the magic of “Next Gen Stats”, this play becomes a statistical anomaly. Let’s analyze this claim and see if it tells us the outcome of this play is remarkable, from a statistical point of view.

We can stipulate that for a play starting at about the 5 yard line, when there are four  unblocked defenders within 8.3 yards of the runner, it is unlikely that a touchdown will be scored. I would conjecture, however, that a large majority of the time those defenders would be in front of the runner.  Indeed, it might be that only 15% of the time that none of the four closest defenders are in front of the runner. In that case, the touchdown probability for the way this play initially unfolded could be  94.7%  (that is, 14.2/15=0.9466…), if all of the touchdowns occurred when no defenders were in front of the runner.

An experienced football fan would look at the stills above and would bet it is more likely than not for the runner to score (even if it was clear there was a defensive back right at the goal line). This is because there is still a lot of room to the sideline and the bad angle the defender is taking. It seems like the worst possible outcome is that a tackle is made at the 2 or 3 yard line.

The fact that there are four players at the given distances does not tell us very much about the play. Indeed, it seems like two of those defenders aren’t really even running at this point.  What is important, and not quantified here (possibly because it is much harder), is that none of the defenders are in the path of the runner and that the nearest defender is on the goal line.

This is an example of using the data at hand to make a decision that doesn’t really have any insight.  Looking at the available data, we can infer that yes, if there are that many unblocked players, that close to a runner, it is unlikely he will score.  Of course, most of those plays will be cases where the runner is swarmed in the backfield by defenders that surround him from the start of the play. However, once we have a bit more information, like that none of those defenders stand between the runner and the end zone, it becomes much more likely to score a touchdown. The eye test would tell you that the probability is much better than the roughly 1 in 7 chance this play has a score a touchdown.

I just don’t know man (Epistemic Uncertainty)

Yes, epistemic uncertainty (that is, uncertainty due to a lack of knowledge) is likely the biggest thorn in the side of a modeler/computational physicist/ computational engineer (I would make a bolder statement, but, at the risk of being redundant, I just don’t know man). One way of dealing with this is the concept of Probability Boxes. The basic idea is to treat aleatory uncertainties as distributional and epistemic as to be an interval. On the one hand, this is conservative because we have no assumptions about the underlying distribution, but the actual distribution is not likely to be completely flat.
Continue reading

Prediction and Calibration

In this post we’ll look at some actual experimental data (crazy, I know) and use simulation data from the code Hyades2D to try and produce calibrated results. The data is the very same used in Stripling, McClarren, et al., and we show how Gaussian process models can be used to make sense of simulation and experimental data.

Continue reading

Markov-Chain Monte Carlo

If you’re in the business of sampling from a distribution that you only know up to a constant normalization, Markov-Chain Monte Carlo (and the Metropolis Algorithm) are for you.  The Metropolis algorithm is name after a scientist, and not the adopted hometown of an illegal alien, but it can leap unruly distributions in a (burn-in + n) bound.  In particular, Bayes’ Theorem gives us an unnormalized distribution we would like to sample from.

Continue reading