Causal inference is one of the most important, most subtle, and most neglected of all the problems of statistics according to Philip Dawid who is massive figure in the field of statistics. So, lets put some light on it to understand its importance.
With technological and engineering advancements during recent times, science can answer most of the questions that were unanswered. The questions such as how big bang would have occurred or what changes in atmosphere led earth to be a habitable planet and many more. Although we still can’t find the causal reasons to why do some things happen the way they do. Consider the big bang as example. Why did it occur? That’s where causal inference comes in handy. It helps us to know answers of those “Why”.
The understanding of causality can be very complicated. It might be the general thing in our daily lives like if baby is crying he might be hungry, but finding an association between cause and effect can be lot more difficult in modeling the real world scenarios. The reason being the uncertainty in the world. Chaos Theory can define those uncertainties in ever changing world. According to Wikipedia Chaos Theory is defined as “Chaos theory is an interdisciplinary theory stating that, within the apparent randomness of chaotic random system, there are underlying patterns, constant feedback loops, repetition, self similarity, fractals, and self organization. The Butterfly Effect, an underlying principle of chaos theory, describes how a small change in one state of a deterministic non-linear system can result in large differences in a later state.” The theory in Layman’s term can be explained as “ A butterfly flapping it wings in Brazil can result in Tornado in Florida.” This might just give you the idea of complexity involved in modeling causal inferences.
Observation, Intervention and Counterfactuals are the pillars of human cognitive abilities. They are beautifully described by Judea Pearl in his book as “Steps on Ladder of Causation”. Observation stands for being able to detect simple patterns, intervention means being able to experiment to see if outcome is different than it was before. Counterfactuals means being able to imagine and creating mental simulation of how things might happen (simulation can be wrong, but it doesn’t really matter). Each step builds upon the step before it, hence “Ladder” of causation.
In general, you cannot prove a causal effect at least without performing an experiment i.e. Intervention. When you deal with data, the most you can expect is to talk about correlation, not causation. And mind you, they are not the same.
This study of causation helps data scientists model the real world more accurately, keeping tabs on the uncertainties even though they are dynamic. Causal inference decreases the chances that your analysis is totally wrong. As we know in Simpson’s paradox you can easily arrive to the totally opposed conclusion from what reality is and you still be sure about your intel as you feel that I have reached to the conclusion statistically. This confidence can potentially lead to disastrous results.
“The Book of Why” by Judea Pearl and Dana Mackenzie
“The Signal and The Noise” by Nate Silver