About Stacked Turtles

About stacked turtles

What is the "stacked turtles" blog about?

This blog is about using data to build models of the world, using those models to make predictions, and then finding effective ways of communicating those results.

Who runs this blog, and why?

My name is Damien Martin. I graduated from UC Davis in 2011 with a Ph.D. in physics, and for 6 years after that taught physics, math, and computer science classes at undergraduate institutions. I noticed that one of the most transferrable skills that students pick up from physics classes are not the physical laws of nature, but instead thinking critically about data. I have made the transformation myself, and now work as a data scientist.

Why do we need models? Why not just look at the raw data?

One of the hardest concepts in dealing with data is realizing that not all patterns we find in data are meaningful. A humorous collection of examples can be found in this collection of correlations by Tyler Vigen.

A model for the world gives us context, and allows us to make predictions based on our data. The model may be wrong, or incomplete, but it can be tested. We can replace it. It will help us interpret and use the data around us. The most succinct way I know of putting it is

Models are the lens through which we view data
— Unknown
In particular, we have to be careful of people using our ignorance of statistics to decieve us, as well as our tendancy to mislead ourselves by things we want to be true. The former is discussed in detail in Lies, Damned Lies, and Statistics, while the latter is discussed in the excellent Statistics Done Wrong.

What do turtles have to do with models, data, and visualization?

In A Brief History of Time, Stephen Hawking tells the story of an astronomer who, after giving a lecture on how gravity causes the Earth to orbit the sun, was told by someone in the audience that the scientific theory was "rubbish" — everyone knew that the Earth was a flat plate supported by a giant turtle. The astronomer humored the audience member by asking what the turtle stood on. "Why, another turtle of course!" was a the response. When asked what that turtle stood on, the audience member replied "It's turtles all the way down!"

I find this story connects to models and data in three important ways

  • Models need some explanatory or predictive power to be useful
    A model that only manages to be self-consistent, without making other predictions is neither useful nor intellectually satisfying.
  • We need to find ways of testing and validating our model
    We should not be come up with contrived ways to convince others that our models are "right". It is far more useful to test our models against competing models, and adopt whichever one seems to have more predictive power.
  • But it's models all the way down
    This story is told from the point of view of a reductionist. Such a person might believe that the stuff around us is made of atoms, which is made of subatomic particles, which are made of .... (?); but that eventually this chain of reasoning ends. Even if it ends in the physical world (and it is not clear that it does), models of realistic problems will be able to be refined and replaced by a "better model" that captures more nuance in the data. So in defense of the argument made by the audience member, it seems that it really is models all the way down.