Category Archives: Uncertainty Quantification

Lies, Damn Lies, and Next Gen Stats

This commercial showing an exciting play where Christian McCaffery scores a touchdown has been out for some time now. In it McCaffery discusses how it was improbable (14.2% of the time or about 1/7) that a touchdown would be scored with four defenders at a given distance away (3.0, 3.3, 7.5, and 8.3 yards away). Due to the magic of “Next Gen Stats”, this play becomes a statistical anomaly. Let’s analyze this claim and see if it tells us the outcome of this play is remarkable, from a statistical point of view.

We can stipulate that for a play starting at about the 5 yard line, when there are four unblocked defenders within 8.3 yards of the runner, it is unlikely that a touchdown will be scored. I would conjecture, however, that a large majority of the time those defenders would be in front of the runner. Indeed, it might be that only 15% of the time that none of the four closest defenders are in front of the runner. In that case, the touchdown probability for the way this play initially unfolded could be 94.7% (that is, 14.2/15=0.9466…), if all of the touchdowns occurred when no defenders were in front of the runner.

An experienced football fan would look at the stills above and would bet it is more likely than not for the runner to score (even if it was clear there was a defensive back right at the goal line). This is because there is still a lot of room to the sideline and the bad angle the defender is taking. It seems like the worst possible outcome is that a tackle is made at the 2 or 3 yard line.

The fact that there are four players at the given distances does not tell us very much about the play. Indeed, it seems like two of those defenders aren’t really even running at this point. What is important, and not quantified here (possibly because it is much harder), is that none of the defenders are in the path of the runner and that the nearest defender is on the goal line.

This is an example of using the data at hand to make a decision that doesn’t really have any insight. Looking at the available data, we can infer that yes, if there are that many unblocked players, that close to a runner, it is unlikely he will score. Of course, most of those plays will be cases where the runner is swarmed in the backfield by defenders that surround him from the start of the play. However, once we have a bit more information, like that none of those defenders stand between the runner and the end zone, it becomes much more likely to score a touchdown. The eye test would tell you that the probability is much better than the roughly 1 in 7 chance this play has a score a touchdown.

Slides for EU Regional School

Leave a reply

Here are the slides I presented at the Aachen Institute for Advanced Study in Computational Engineering Science of RWTH University Aachen.

Part 1

Part 2

I just don’t know man (Epistemic Uncertainty)

Leave a reply

Yes, epistemic uncertainty (that is, uncertainty due to a lack of knowledge) is likely the biggest thorn in the side of a modeler/computational physicist/ computational engineer (I would make a bolder statement, but, at the risk of being redundant, I just don’t know man). One way of dealing with this is the concept of Probability Boxes. The basic idea is to treat aleatory uncertainties as distributional and epistemic as to be an interval. On the one hand, this is conservative because we have no assumptions about the underlying distribution, but the actual distribution is not likely to be completely flat.
Continue reading →

Prediction and Calibration

Leave a reply

In this post we’ll look at some actual experimental data (crazy, I know) and use simulation data from the code Hyades2D to try and produce calibrated results. The data is the very same used in Stripling, McClarren, et al., and we show how Gaussian process models can be used to make sense of simulation and experimental data.

Continue reading →

Markov-Chain Monte Carlo

Leave a reply

If you’re in the business of sampling from a distribution that you only know up to a constant normalization, Markov-Chain Monte Carlo (and the Metropolis Algorithm) are for you. The Metropolis algorithm is name after a scientist, and not the adopted hometown of an illegal alien, but it can leap unruly distributions in a (burn-in + n) bound. In particular, Bayes’ Theorem gives us an unnormalized distribution we would like to sample from.

Continue reading →

Basic Polynomial Chaos Expansions

Leave a reply

In this example, we do some basic orthogonal expansions of 1-D functions of random variables and then use sampling from the approximation and the true function to evaluate the expansion. In future work we will show how to compute the expansions using quadrature. The source for the markdown is here.

Continue reading →

Principle Components Analysis for Data Reduction

Leave a reply

In this markdown we show step by step how to reduce the dimensionality of data using principle components analysis via singular value decompositions. The source can be found here. The grade data used is here.

Continue reading →

Quasi-Monte Carlo Methods

Leave a reply

In this example I demonstrate how to quasi-random number generators behave in different circumstances. The methods investigated are van der Corput seuqnces, Halton sequences, Sobol sequences, and torus sequences. The random numbers are produced using the randtoolbox available in R.

Continue reading →

Monte Carlo Sampling for UQ

Leave a reply

This document (created via R markdown), shows how one can propagate uncertain parameters through a model via simple random sampling and stratified sampling.

Continue reading →

Large-scale sensitivity analysis using regression

Leave a reply

In this markdown (source here), we work through an example where we have a lot of variables, but only a few are significant. It demonstrates how different regularized regression techniques, namely ridge and lasso, can be used to tackle this problem with fewer data points than unknowns.

Continue reading →

DrRyanMc.com

The Computational Engineering, Statistical Forecasting, Data Science, and Oxford Commas of Ryan McClarren

Category Archives: Uncertainty Quantification

Lies, Damn Lies, and Next Gen Stats

Slides for EU Regional School

I just don’t know man (Epistemic Uncertainty)

Prediction and Calibration

Markov-Chain Monte Carlo

Basic Polynomial Chaos Expansions

Principle Components Analysis for Data Reduction

Quasi-Monte Carlo Methods

Monte Carlo Sampling for UQ

Large-scale sensitivity analysis using regression