attribution science, and some thoughts on computer modeling

This Slate article explains how attribution science works. It depends on modeling. Basically, scientists model an event (like a storm, flood, fire, whatever) using a hypothetical condition where the event did not occur, and compare that to the data from our actual universe where it did occur.

I do a fair amount of modeling in my job, and there are always skeptics (some more informed than others). Why would anyone trust a computer model? Isn’t empirical measurement always better? Well, we model things we can’t measure, often things that could or would have occurred if things were different, or things that might happen in the future. To trust a model, first, somewhat obviously, you need to say what the model is for, clearly. Second, you need to be confident that it is adequately representing the real-world processes underlying the system you are interested in. Whether this is true requires expert judgment, and the expert needs to really understand the system. If the expert is confident in this, and the expert knows what they are doing, the model has some usefulness even if there is no data. (Purely empirical models like regression equations don’t represent processes, and therefore have limited predictive value if conditions change significantly.) But we always want data. Third, the modeler will compare what the model predicts to some real data. The modeler needs to be aware that there is always uncertainty in how well measurements represent the real condition of the actual physical universe, and that this uncertainty will propagate through the model (the uninformed often think of this as “model error”.) If the prediction is reasonably accurate without tweaking, you may have a pretty good model. Often the modeler will do a little tweaking to improve the fit, but the more tweaking the more you are moving toward an empirical model with less predictive value. In a somewhat old-fashioned (according to me) but common approach in the engineering field, the modeler will set a portion of the data aside while doing the tweaking, then compare the tweaked model to the portion they set aside. I don’t usually do this, because there is never enough data. I tend to use it all, then check the model again when more data becomes available in the future.

Finally, we have a model that we are confident represents underlying processes, matches real-world measurements reasonably well, and is suitable for its stated purpose. We can use the model for that purpose, be clear about the known unknowns and unknown unknowns, and draw some conclusions that might be useful in the real world. We have some information that can inform decisions better than guesses alone could have, and that we couldn’t have learned from data alone.

Leave a Reply

Your email address will not be published. Required fields are marked *