It’s called science, stupid.
The Internet loves sharing psychology studies that affirm lived experiences, and even the tiniest ticks of everyday people. But somewhere in the mix of all those articles and listicles about introverts, extroverts, or habits that “make people successful,” a debate still lingers: Is psychology a “real science?” It’s a question that doesn’t seem to be going away anytime soon. Last week, the Reproducibility Project, an effort by psychology researchers to redo older studies to see if their findings hold up, discovered that only 36 of the 100 studies it tested reproduced the same results.
Of course, many outlets exaggerated these findings, referring to the re-tested studies (or to psychology in general) as “failed” or “proven wrong.” However, as Benedict Carey explains in the New York Times, the project “found no evidence of fraud or that any original study was definitively false. Rather, it concluded that the evidence for most published findings was not nearly as strong as originally claimed.”
But “many psychology studies are not as strong as originally claimed” isn’t as interesting of a headline. So, what’s really going on with psychology research? Should we be worried? Is psychology a “hopeless case?” It’s true that there’s a problem, but the problem isn’t that psychology is nonscientific or that researchers are designing studies poorly (though some of them probably are). The problem is a combination of two things: Statistical methods that aren’t as strong as we thought and a lack of interest in negative findings.
A negative finding happens when a researcher carries out a study and does not find the effect they expected or hoped to find. For instance, suppose you want to find out whether or not drinking coffee every morning affects one’s overall satisfaction with their life. You predict that it does. You take a group of participants and randomly assign half of them to drink coffee every morning for a month, and the other half to abstain from coffee for a month. At the start and at the end of that month, you give them a questionnaire that assesses how satisfied each participant is with their life.
Negative results might not generate catchy headlines, but they are important because they can help debunk popular (and potentially harmful) ideas about what causes what.
If you find that drinking coffee every day makes no difference when it comes to one’s life satisfaction, you have a negative result. Your hypothesis was not confirmed.
This result isn’t very interesting, as research goes. It’s much less likely to be published than a study with positive results—one that shows that drinking coffee does impact life satisfaction. Most likely, these results will end up gathering figurative dust on the researcher’s computer, and nobody outside of the lab will ever hear about them. Psychologists call this the file-drawer effect.
That’s a problem in and of itself. Negative results might not generate catchy headlines, but they are important because they can help debunk popular (and potentially harmful) ideas about what causes what. Imagine a negative study about whether or not video games cause violent behavior in youth, for instance, and how those findings might be used if they didn’t exactly align with a researcher’s prediction.
But the file-drawer effect is even more harmful when you consider the statistical analyses used in psychology research. Usually studies will report what’s called a p-value, which is the likelihood that these study results could’ve happened by chance alone, and not because there’s a real effect there.
Usually studies will report what’s called a p-value, which is the likelihood that these study results could’ve happened by chance alone, and not because there’s a real effect there.
So, to go back to the previous example, suppose my study finds that drinking coffee every day actually tends to increase life satisfaction. I might say that these results have a p-value of 0.04, or 4 percent. That means that there’s a 4 percent chance that I could’ve gotten results like these if drinking coffee actually doesn’t increase life satisfaction. There’s a 4 percent chance that these results are a complete fluke and don’t mean anything.
For years, psychologists have used p-values of 0.05 or 0.01 as thresholds–5 percent and 1 percent, respectively. That seems pretty low: If there’s only a 5 percent chance of a thunderstorm today, you’ll probably go ahead and have a barbecue or take a bike ride.
But think about it this way: A p-value of 0.05 also means that if 20 studies just like this one are carried out, it’s likely that at least one of them will show an effect even if there isn’t one. If drinking coffee doesn’t actually affect life satisfaction, and 20 different research teams carry out studies to test if it does or not, statistically, one of those studies will probably show that it does. Those 19 negative studies will sit in someone’s file drawers, the one positive study will get published, and news outlets all over the Internet will proclaim that drinking coffee makes you happier.
So, yes, there’s a problem. Luckily, researchers are already working on it. The Reproducibility Project itself is one example. In addition, some research journals are making a special effort to publish and highlight negative results, which would help reduce the file-drawer effect. Some journals have even acknowledged the weaknesses of p-values, and are moving away from them.
Society’s most pressing questions that can be answered with good research. Good research doesn’t mean perfect research, since that’s not possible.
How about the rest of us? As consumers of online news and other media, we can learn a lot from the sciences—of which psychology is absolutely one—and their emphasis on gradual, careful self-correction. The fact that new research has found that some older results don’t quite hold up isn’t a bug, it’s a feature of science working as it should. Just as researchers are constantly designing better studies and revising their previously-held opinions, so should we always seek out more information, new perspectives, and more nuanced analyses.
It’s tempting to just throw up our hands and claim that we can’t trust psychology research anymore, but that’s the wrong reaction. We need psychology. Many of today’s most pressing debates are actually questions about psychology, questions that can be studied empirically using the scientific method. Do video games cause violence? How do we change the minds of people who refuse to vaccinate their children? How can we reduce racism and other prejudices? How can we prevent sexual assault?
These are questions that can be answered with good research. Good research doesn’t mean perfect research, since that’s not possible. Rather than waiting around for all claims about psychology to be “proven right” or “proven wrong,” we should proceed based on the best evidence we currently have, knowing that future research might prod us in a new direction.
Miri Mogilevsky is a social work graduate student who loves feminism, politics, New York City and asking people about their feelings. She has a B.A. in psychology but will not let that stop her from getting a job someday. She writes a blog called Brute Reason, tweets @sondosia, and rants on Tumblr.
Illustration by Jason Reed