I’ve been watching the recent debate about replication with interest, concern, and not just a little amusement. It seems everyone has their opinion on the matter (leave it to a field of scientists to have twice as many opinions as there are scientists in the field!), and at times the discussion has been quite heated. But as a grad student, it’s been difficult to know whether I should throw my own hat in the ring. With psychology heavyweights like Kahneman and Gilbert voicing their opinions, what room is there for a third-year grad student? But fortunately (or unfortunately), I’ve never been one to know when to keep my opinions to myself, so I want to present my own thoughts on the matter. My perspective is that, even if the issue gets heated at times, this discussion can be fruitful as we learn to navigate a changing discipline.
I won’t bother to spend time rehashing everyone else’s opinions, but for those just tuning in, you can look on Twitter for #repligate, and Brian Nosek has tweeted links to about a billion articles so far discussing the role of replications. So you can go there and catch up if you’d like. Suffice to say, there have been people on all sides of the spectrum, from those suggesting that replications are critical; to those advocating replication, just not necessarily direct replication; to those saying that replication isn’t the important issue to talk about; to those saying that replication in and of itself is entirely useless. Finding a path forward through this in the next few years will certainly be tricky. But during this discussion, I have been reminded of a paper regarding the “future of social psychology” written in 1978 by William McGuire, which was only recently published by John Jost. Though Jost found it appropriate to publish in light of other recent discussions regarding research ethics and disclosure, I find it just as apt (or perhaps moreso) when discussing the role of replication.
“All Hypotheses Are True”
One of McGuire’s central points is that “all hypotheses are true.” At first, the notion might seem ridiculous, but his point is that any researcher who has spent considerable time and thought to advance a particular hypothesis has likely struck upon some fact about the world—and this even for those who might similarly advance a contrary hypothesis. He argues that the point of empirical research is not to test the truth of a hypothesis, but rather to develop it, “to make its hidden assumptions, limitations, and interacting factors clearer to ourselves and others.” For those who advance competing hypotheses, the goal is to uncover where and when each hypothesis holds true—in other words, to find the moderating variables or assumptions that explain when each is apparent. I think when discussing replications, the same philosophy could be applied to effects rather than hypotheses: all effects are real, but we must uncover the boundary conditions and moderators that define that reality.
Indeed, much of the argument for direct replication involves “figuring out whether effects are real.” But we in social psychology are all well aware of the multiply determined and incredibly nuanced nature of any phenomenon we are studying. For any one variable we try to measure, there are dozens, perhaps hundreds, of other variables which influence it, and even more that mediate and moderate those relations. As James Beck, the professor of my class on work motivation, put it, “The world is correlated at r = .3″. In a field filled with multiply determined phenomena, it is especially difficult to distinguish between effects that are “not real” and effects that have boundary conditions. What is the difference between an effect that is difficult to replicate because it doesn’t really portray some underlying reality, and one that is difficult to replicate because it has numerous boundary conditions? Conceptually we may be able to distinguish these, but empirically, replication does not really help answer this question. If Researcher A gets an effect but Researcher B does not, the data give us no real way to know whether Researcher A’s finding was a fluke, or whether there is some unmeasured moderator that explains the difference between the results.1 But in either case, when applying McGuire’s perspective, the question is no longer one of “is it real or not?” but “how and when is it real?” This approach means that we don’t have to get bogged down trying to parse out the distinction between “unreal” and “bounded”. Instead, we can still move forward in science, and add nuance to the understanding of phenomena we care about.2
Pitfalls and Perspectives
I think McGuire’s perspective can help to clarify the role of replication in science. If all hypotheses are true and all effects are real, then a replication does not cast a negative light on the truth of the original research. What it does is open up new questions to be answered: If Researcher A found this effect, but Researcher B did not, what accounts for the difference? Posing the question this way avoids two pitfalls. First, it helps us to avoid the nasty threat of replications “proving my research wrong”. Of course researchers care about their research and their reputation and want to defend it. But if it is not a question of whether “my” effect is “real” or not, but rather what moderating variable determines why I got one thing and you got something else, this unlocks another puzzle to solve. We can hopefully get past some of the defensiveness that comes when one perceives one’s reputation is being attacked (whether or not that is actually the case).
The second pitfall McGuire’s perspective addresses is the concern that null findings do not “prove” anything. This is a valid concern, as there are of course many more ways to get null results than there are to get significant ones. Even the best direct replication can still harbor some unknown critical variable that destroys the effect. One response to this is, “Well, if your effect is so flimsy that minute changes in methodology can erase it, then it’s not even worth studying!” The other (and better) response is, “Does the variable that moderates the effect provide any better understanding of the phenomenon we are studying?”
As an example, think of cross-cultural research. In a sense, much of the early research in this area can be seen as direct replication. Researchers took existing, supposedly “universal”, effects like the fundamental attribution error (so universal that it’s fundamental!), and then ran the same paradigms in other cultures and noted any differences. One response to this research is, “Well, if the effect does not hold across all cultures, then it’s not even worth studying!” But again, the better response is, “Does the knowledge of cross-cultural differences in the fundamental attribution error tell us anything new about attribution theory?” Turns out, it tells us a great deal. Even a null finding can help us reach important conclusions, if wielded correctly.
Roles for Replication
As I mentioned above, given the large number of possible boundary conditions and moderators, replication cannot be much help in distinguishing between “unreal” and “bounded” effects. The way to determine whether data provides evidence for a “real” effect is to use p values based on the hypothetical distribution of infinite identically-run experiments—or even better, Bayesian statistics to determine the evidence in favour of competing hypotheses, given the data.3 Thinking about it this way, it’s true that a direct replication could provide a larger sample size to draw from for these analyses—but it would have just been better for the original researcher to have run a higher-powered study in the first place. Once we leave the original study, we are making assumptions about what variables are and are not relevant—or rather, we add to the assumptions that already exist from the original study. For instance, the original study likely assumes that time isn’t a significant factor (since they probably didn’t run all their participants at the exact same time), that the shirt the experimenter wears isn’t a significant factor, etc. To use a replication as an added sample size, then, assumes that different labs don’t make a difference, different experimenters don’t make a difference, running a study years later doesn’t make a difference, and so on. Clearly, to minimize such assumptions, higher-powered studies are always preferable to replication.
Boundaries and Mechanisms
However, I think a McGuirian perspective still provides an important role for replication: determining the boundaries, moderators, and mechanisms of effects. Many studies in psychology already do this, explicating the moderating factors associated with effects found in other papers. Such “replication-and-extension” research uses the moderators to provide a better understanding of a phenomenon. If a moderating variable is important to understand and explain the effect in question, then studying it is relevant and useful. I think this is relatively uncontroversial.
Direct replications (i.e., “replication without extension”) could similarly be conceptualized as a way of determining (or ruling out) possible moderating variables, where such variables are not relevant to understanding the phenomenon in question. If an effect is due to questionable research practices, p-hacking, pure chance, or some artifact of the study design, then these moderators do not really help us to understand the phenomenon; they only serve as a check on other researchers’ methods and practices. In other words, that an effect of cognitive dissonance only appears when you p-hack doesn’t really tell us anything about cognitive dissonance, only about proper research practices. In such cases, a paper displaying a “cognitive dissonance × p-hacking interaction” is not really a novel contribution to the literature on cognitive dissonance (though it may be a useful primer on why p-hacking matters). In essence, direct replications can be useful when the presumed moderators do not help us better understand the phenomenon in question.
However, even in such cases, my sense is that replications may leave too many questions open. A direct replication may fail to replicate the original findings, but it still leaves us in the dark about why not. Was it a problem with my study? Was it a problem with the original study? Was it a regional effect? A temporal effect? Was it p-hacking by the original researcher? The replication offers very little to go on, except perhaps speculation in the discussion section. Some of these possible moderators may help us understand the phenomenon, while others may not. Generally, direct replications implicitly designate something about the researcher (e.g., their methods, their removal of outliers, their analysis techniques) or the lab (the location, the temperature, the mannerisms of the experimenter) as the moderating variable(s) accounting for the divergent results. But “something about the researcher or the lab” is a vague target that could benefit from more specific approaches.
This is why I have reservations about the role of direct replications that do not attempt to explicate (or rule out) any moderating variables. I don’t think replications are really able to determine whether an effect is “real” or not (though if we take McGuire’s approach, that question is not relevant anyway), and it certainly has no chance of doing so unless it makes a reasonable attempt to rule out possible moderators that might explain a null replication.4 Otherwise, in a field that always studies multiply determined phenomena, it is just as (or more) likely that a null replication simply did not account for a relevant moderator that explains the divergence in results, rather than saying anything about the “reality” of the original effect.
That being said, I do support several practices wholeheartedly: a) running high-powered studies in the first place; b) structuring scientific research and dissemination in a way that incentivizes people to do high-quality research that avoids questionable research practices; c) pre-registering hypotheses and methods, and increasing transparency regarding methods and data; and d) replicating results that people care about in a way that presumes those effects to be “real”, but that also contributes to our understanding of when, why, and how an effect occurs. All of these practices serve important roles in increasing the confidence we can have in future published research. Direct replications in particular serve their best function as a check on the existing literature that includes many under-powered, questionable-research-practice-laced studies; they serve their purpose as a stop-gap measure while we implement better solutions for the future.
Guidelines for Researchers and Replicators
With all this said, given the current role of replication, direct or otherwise, what sort of guidelines should we have for researchers and replicators? Here are a few I can think of, based on my ramblings above.
1. Replicators should follow good etiquette.
Recently, Daniel Kahneman wrote an article about good replication etiquette. I think what he suggests is useful, though perhaps a little restrictive. I would suggest at the very least that replicators should attempt to dialogue with the original authors, and where possible should collaborate with them on the replication. Such “adversarial collaborations” can be an excellent way to ensure that the end product is something with which both parties are satisfied. But if an author is not amenable to such an arrangement, replicators should at the very least do a replication in good faith: run a high-powered study (if you are going to test someone else’s ideas, at least do it well!), don’t accuse others of p-hacking or questionable practices unless the evidence is clear,5 and remember that the more important question is not, “Is the effect real?” but rather, “Given that those researchers get an effect, can I get the same thing? And if not, why not?” The more you can take the original effect as a given and move from assessing “reality” to generalizability, the better.
2. Try to test alternative explanations.
I understand that this might not always be possible. Some published replications are no doubt the result of someone’s repeated frustrations at trying to reproduce an effect (for one’s own research) and not being able to get it. Finally, they publish a replication to say, “Hey, is this thing even real??” But if you care so much about understanding the phenomenon in question, the better approach might be to ask what underlying variables might be causing Researcher A to be getting something different from Researcher B, and then test them. Doing so not only provides a new question, but an answer to that question as well. Of course, if you make a reasonable attempt at finding moderators and are able to rule them all out, then publishing such results is still beneficial. But if you have made no attempt to test possible alternative explanations for why you find null results, then your replication offers limited usefulness in understanding what exactly is going on. Your data is the written equivalent of a noncommittal shrug.
Another benefit of searching for moderators is that it can be a good avenue for collaborating with the original authors. You don’t have to email them by saying, “Hi, I can’t replicate your results so give me your data/methods to pick through it and air your dirty laundry.” Instead, you can approach them by saying, “Hey, I’ve tried to replicate this effect, but I can’t seem to do it, and I’m interested in seeing if there are any critical factors in the methodology that may be influencing it. Would you be willing to work with me and trying to find out what is different between your lab and mine?” No doubt, numerous differences exist, and such differences may help shed more understanding on the phenomenon itself.
3. Do good research in the first place.
Like I said above, direct replications are most useful for the existing literature that may be permeated with existing questionable research practices. Sometimes these practices are honest ignorance—I’m sure many researchers were taught that “this is the way it’s done” by their advisors, and just carried on the practice none the wiser. Others are surely due to pressures to get “perfect” studies in order to publish, given the extreme bottleneck that exists with getting publications. Whatever the case, doing what you can to ensure your own research is beyond reproach will do a lot to avoid subsequent reproach. If we are to take the perspective of “all effects are real,” then we all need to be able to trust that the original researcher has done their due diligence.
So be clear and thorough in your methods. Even if journals have limits that are too restrictive to let you be thorough, there’s nothing stopping you from publishing more complete details online, on something like the Open Science Framework. Be clear about the decisions you make, in terms of your stopping rule, how you assessed outliers, and how you analyzed the data. Run high-powered studies, and replicate your own effects as well, so you can ensure that the results you publish are reliable. You can’t necessarily foresee all the possible variations present in other labs that might diminish the effects you find, but you can at least make sure that you can find it consistently! Remember that being clear and open about your research is not just about communicating the quality of your research to others; it’s also about being sure to yourself that your research is the best it can be.
4. Stop treating research like your baby.
Quite frankly, as a grad student watching the replication debate go down, I have found that some researchers’ reactions have been less than mature. I understand that people spend enormous amounts of time and effort crafting their theories, their study designs, and their analyses. I understand that for academics, their research is their career, and their reputation is involved. And especially when the media gets a whiff of controversy and you have to deal with masses of people questioning your hard work, it’s easy to get defensive and lash out with a negative reaction. But I think it’s important to remember that publishing is not the end of the line for research; it’s just the beginning, where it gets tested and critiqued and peer reviewed and discussed.
When you are crafting your research, it is like your baby. You coddle it, you pay close attention to it. Some people probably even sing lullabies to their research, I don’t know. But like a living being, research needs to grow up. It can’t be protected and coddled forever. At some point, when it’s ready, you have to let it go, and let it stand on its own. You, as its loving parent, are of course free to come to the defense of your research. You can stand by its upbringing and note its positive qualities. But you cannot protect it. It’s grown up and it has to be out on its own, ready to brave the harsh world of science. So for goodness sake, stop being a helicopter parent and let it go. If you’ve followed the previous point above and done good research, you know you’ve done the best you can to help it stand the best chance of surviving in that cruel world. And if someone tries to replicate it and can’t, you have an opportunity to see that research give birth to its own follow-up research—that is, if you don’t get defensive and try to paint every critique of your research as an attack on your own personal character. Instead, remember that your study was indispensable for paving the way to find out when and how that phenomenon occurs. That is still true, even if a replication finds null results. All it means is that there are questions left to answer.
To end off, I’m reminded of a response that Etienne LeBel and Lorne Campbell received on their attempt to replicate a finding by Matthew Vess. When the editor of Psych Science offered Vess a chance to respond to the null replication, Vess responded:
Thank you for the opportunity to submit a rejoinder to LeBel and Campbell’s commentary. I have, however, decided not to submit one. While I am certainly dismayed to see the failed attempts to reproduce a published study of mine, I am in agreement with the journal’s decision to publish the replication studies in a commentary and believe that such decisions will facilitate the advancement of psychological science and the collaborative pursuit of accurate knowledge. LeBel and Campbell provide a fair and reasonable interpretation of what their findings mean for using this paradigm to study attachment and temperature associations, and I appreciated their willingness to consult me in the development of their replication efforts. Once again, thank you for the opportunity.
Such a response shows a) the efforts of LeBel and Campbell to respect Vess’s work, and b) Vess’s exemplary character, understanding that the “advancement of psychological science” is more important than protecting his own research from critique. I can think of no better attitude for a scientist (original author or replicator) to have.
- Certainly there are factors which would influence this one way or another. As the number of possible moderators increases (though this is virtually unquantifiable), the likelihood of an unmeasured moderator increases, and given a properly-powered study, the likelihood of a result being a fluke remains steady at alpha = .05. But the major takeaway is that all our findings in psychology have a large number of possible moderators, and even direct replications cannot rule out all of them. [↩]
- Another advantage of McGuire’s position is that it is not committed to scientific realism, the idea that scientific theories can provide us with objective facts about the world—rather than that scientific theories exist to build models that better predict observable phenomena. In one’s more reflective moments, I’m sure many scientists would acknowledge a non-realist, model-building version of the philosophy of science, but in practice the realist language tends to emerge. The very notion of talking about whether an effect is “real” or not makes me cringe at times. But in either event, McGuire’s claim can exist under either philosophy of science. [↩]
- As a committed Bayesian, I’m pretty sure I am obligated under penalty of death to mention Bayesian statistics at least once in everything I write. [↩]
- Note that “ruling out possible moderators” involves more than just following the methods of the original study as exactly as possible. It also involves critically questioning whether differences in time, culture, laboratory, and researcher could play a role. I don’t mean to say that replicators never consider such factors, but my sense is that in the zeal to determine whether an effect is “real” or not, some of these factors are glossed over or left unmeasured. [↩]
- And even if the evidence is clear, be charitable. Remember that the discussion over questionable research practices is still a new one. [↩]