Oftentimes when scientific research is presented in a non-scientific context, such as in the news, someone will add the disclaimer that “correlation does not equal causation”. It’s a statement which has become more well-recognized in the general public in recent years, which is a great trend. However, I still find it problematic because it is sometimes used to dismiss correlational research as somehow invalid or subpar. The truth is that while the statement is correct, it is simplistic. So I’d like to offer a more nuanced understanding of how to evaluate research, at a level that an educated but non-scientific audience can understand and appreciate.
Sharks in the Water
It is true that correlation (or association between two variables) is not a sufficient reason for concluding that there is a cause-and-effect relationship between the correlated variables. However, it is one necessary condition. In other words, you can have correlated variables that are not causally related, but you can’t have causally related variables that are not correlated! For example, it is probably the case that the number of people wearing bathing suits is correlated with the number of shark attacks in a given year. However, this does not mean that wearing bathing suits causes shark attacks, or that shark attacks cause people to wear bathing suits. We can easily come up with a plausible reason: Summer weather causes both a greater number of people wearing bathing suits and a greater number of people in the water to be attacked by sharks. However, if summer weather does indeed cause more people to wear bathing suits, those two variables should be correlated with each other. Part of having a cause-and-effect relationship means that a change in the cause is associated with a change in the effect.1
Scientists and philosophers have thus come up with some general criteria for establishing that a cause-and-effect relationship exists:2
- Covariation (or correlation) between the presumed cause and effect
- Temporal precedence of the cause (i.e., the cause must precede the effect)
- Lack of plausible alternative explanations for the covariation
I think the first two criteria are fairly straight-forward, but I will discuss the third in a little more detail a bit later. The point here is that correlation is an essential part of establishing the existence of cause and effect. It is just not sufficient by itself to establish this.
So if correlational studies aren’t able to prove cause and effect, why do scientists use them? Well, scientists have a broad range of research designs available to them to answer questions about the world. In broad strokes, the major distinction is between correlational and experimental designs. A correlational design measures two variables and determines whether they are related. An experimental design requires two things: manipulating a variable (the presumed cause), and randomly assigning observations to either a treatment or control group. In the treatment group, the variable is changed in some way, whereas in the control group it is left alone.3 For example, if I am running a psychological study on whether ice cream makes people happy, I might randomly assign participants to either be given ice cream (experimental group) or nothing at all (control group), and then ask them how happy they are on a scale from 1 to 10. This type of design has a number of advantages over a correlational design, but the one that is most relevant is the random assignment. By randomly assigning participants to the two groups, I can make sure that any differences between the two groups that are not related to the ice cream variable are due to chance. This offers scientists a great deal of control, to ensure that the presumed cause is the only thing being changed between the two groups. If I were to simply ask people how much ice cream they ate in the past day and then measure their happiness, I don’t have that level of control. Perhaps people who eat more ice cream also eat more brownies, and it’s actually brownies that lead to greater happiness!
In theory, these two types of research designs are easy to distinguish. However, in practice, scientists often end up with a mixture of the two. This could be for practical, financial, or ethical reasons. A variable that is presumed to be the cause might be impossible or impractical to manipulate directly (it is hard to make a star go supernova to study its effects, for example), or unethical (when studying the effects of the death of a spouse on people’s well-being, an ethics review board would probably frown on a study design that involved randomly assigning spouses to be killed). It is also often the case that a scientist has one primary variable they wish to manipulate, but they also want to take a look at the influence of other variables that they then measure. For instance, I can manipulate whether people receive positive or negative feedback, but I can’t manipulate (for practical and ethical reasons) their self-esteem. If I want to study how people with low self-esteem react to negative feedback, then, I need to use a mixed design. This all makes the scientific process very complicated, since there is a mix of manipulated and measured variables that each offer more or less control.
What enables scientists to talk about causal explanations, then, is to place less emphasis on the design of the study (though that is, of course, important) to show whether causal explanations are possible, and instead talk about alternative explanations. If you remember the three criteria above, the third criterion is a lack of plausible alternative explanations. And this is the true engine of scientific advancement and debate. In order to critique a particular scientific study, it is not enough to say, “Well, it was a correlational study, so it doesn’t really say anything about causal relationships.” Instead, scientists need to come up with an alternative explanation that is more plausible than the one offered by the original researchers, and then test it (or suggest it for someone else to test).
Let’s take one step backward. Before a particular study even gets to the point of being critiqued by others, the scientists conducting it take great care (ideally) to create a study that rules out as many alternative explanations as possible. This is why experimental designs are so preferable, because they are often much better at ruling out many alternative explanations for the results. If I have truly changed only one thing between two otherwise identical groups, and I then find that those two groups differ on some important variable, I have extremely good evidence of cause and effect. Correlational designs, while generally not as good at ruling out alternatives, can still offer relatively strong evidence if done properly. More advanced correlational techniques like cross-lagged designs can help to establish temporal precedence (criterion #2) by showing that the link between variable A at time 1 and variable B at time 2 is stronger than the link between variable B at time 1 and variable A at time 2. And of course, depending on the variables one is studying, it can be relatively easy to rule out some alternative explanations. For example, one study has shown that violent crime rates increase as temperature rises (leading to more crime in summer months).4 While there could perhaps be other variables that affect both temperature and crime rates, one explanation that does not seem plausible is that crime itself actually increases the outside temperature. The nature of the variables being studied can make some explanations wildly implausible, leading to a stronger case for a certain interpretation.
What this all means, then, is that in the end it is not so much the type of study design one uses (correlational, experimental, or a mix of both) that determines the strength of a study, but instead how well it rules out alternative explanations. A well-designed correlational study that measures the right variables in the right way can be a very convincing study indeed. And a poorly-designed experimental study can still leave plenty of alternative explanations (known in the business as “confounds”) that severely undermine the case the researchers are trying to make. Understanding the role of plausible alternative explanations is the key to understanding how science works.
So the next time someone says to you, “Correlation does not equal causation,” you can now say, “Well, actually…” and provide them with a deeper understanding of scientific research methods. Correlation may not tell you everything about cause-and-effect relationships, but it can be a good indicator that something might be there. It’s a valuable tool of science, and when used properly it can tell us much about what variables may cause other variables to change. By ruling out alternative explanations, we can find our way toward a better understanding of how the world works.
- Having said this, not being able to detect a correlation between two variables does not necessarily mean that the two are not causally related. Sometimes, there can be other variables that suppress the relationship between a cause and effect, making it appear that they are unrelated. For example, gravity is causally acting on airplanes all the time, yet they are able to fly through the air because other forces counteract the force of gravity. It’s only when you remove those other forces (e.g., break off the wings, turn off the engine) that you see the causal relationship! Thus, it is most accurate to say that detecting a correlation is a necessary condition of inferring causality in the absence of countervailing forces. Thanks to Daniel Nadolny for this correction. [↩]
- I am greatly simplifying the debate that exists regarding the nature of causality, especially among philosophers of science. The rules I am listing here come from Cook, T. D. & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis for field settings. Chicago, IL: Rand McNally. For those interested in learning more about the debate, the book has a good overview of the major philosophical traditions and their historical development. You can also check out Bradford Hill’s criteria for causation, which list more criteria, but I think agree reasonably well with Cook and Campbell. [↩]
- It should be noted that not all experimental designs use strict “control groups”. For instance, some research might compare one treatment to another, where the purpose is just to find out whether treatment A works better than existing treatment B. However, it’s also important to understand that using a control group doesn’t necessarily mean “doing nothing”. In medical research, for instance, the usual control group involves giving participants a placebo—something is still being “done” to participants in this group, but good experimental designs try to make the treatment and control groups as absolutely similar as possible. In a placebo design, the only difference (ideally) is the active ingredient of the drug being tested. Using the placebo group holds constant other factors, like the act of taking a pill and the psychological influence involved in taking medication. [↩]
- Anderson, C. A. (1989). Temperature and aggression: Ubiquitous effects of heat on occurrence of human violence. Psychological Bulletin, 106(1), 74-96. [↩]