Data is only available about the past

“Data is only available about the past.”

-Clayton Christensen (link)

This is an obvious but fundamental limitation that we should not lose sight of! Despite the fact that we typically want to predict the future, the hard data all comes from the past.

One way to deal with this is to assume that what was true in the past will still be true in the future. For example: “If the weather is hot today, it will likely be hot tomorrow.” Of course, the farther into the future we go, the more likely it is that something will change. But the continuity assumption allows us to pretend that the past data also reflects the future. And in many cases, this turns out to be accurate.

The other way to deal with this fundamental limitation is to use the data to form theories of correlation and causality. This is what the scientific method is all about. It allows us to generalize from the specific data and say “any time this happens, this other thing will happen.” For example: “this configuration of high pressure systems will cause the temperature tomorrow to fall.”

In the former approach, the data analyst is very interested in specifics, such as outliers and numeric values.

  • “What is the temperature?”
  • “Are there any problems we need to fix?”
  • “Where are our best successes?”

In the latter approach, the analyst is more interested in correlations, patterns, differences and trends.

  • “What time of year is it usually hot?”
  • “What causes our successes?”
  • “What leads to problems?”

It seems likely that different data analysis tools are optimal for these different types of questions.

Leave a Comment