This tool exposes how easy it is to manipulate scientific data

Cynthia McKelvey

Internet Culture

Published Aug 19, 2015   Updated May 28, 2021, 3:20 am CDT

In the world of research science and academia, you’ll often hear this cynical mantra: “publish or perish.”

Featured Video Hide

The idea is that researchers who frequently publish their work in journals often do better in their careers. Those who publish less are more prone to stagnation, or possibly even losing their jobs altogether.

Such pressures to publish may be leading some researchers to manipulate their data, intentionally or not. Though it may seem difficult to finesse data by accident, it’s actually stunningly easy. In a blog post on FiveThirtyEight titled, “Science Isn’t Broken,” science journalist Christie Aschwanden lets readers see for themselves how easy it is. The article—a fascinating read itself—contains a tool that allows you to work with real-world data to see if you can find a connection between the overall economic health of the nation and whether Republicans or Democrats currently hold the most power. You can choose to include or exclude certain variables. Watch as your results—and their publishability—change with your choices.

Your results are publishable if, after your manipulations, you achieve a “p-value” of less than 0.05.

A p-value, if you didn’t take statistics or need a refresher, is a measure of probability. In statistics you are testing two hypotheses: the proposed hypothesis (e.g. “coffee causes cancer”) and the null hypothesis (there is no relationship between coffee and cancer.)

The null hypothesis is a very important concept in statistics and is the basis for calculating the p-value. Basically, the p-value tells you the probability that you would get your observed data set (such as a large number of people who consumed a lot of coffee and later got cancer) if the null hypothesis were true. That means the smaller the p-value—the smaller the probability that your data set would exist if there were no effect—the more likely it is your results are real. (For the record, there is no known link between coffee and cancer, it’s just an example.)

The problem, as Aschwanden demonstrates with her tool, is that it’s astoundingly easy to get a low p-value.

Using the tool, you can create 1,800 different combinations of variables to yield your result. Of those, 1,078 yielded a “publishable” result (a p-value of less than 0.05, an arbitrary standard for “statistical significance” accepted by many scientific journals.)

Aschwanden is certainly not saying anything that isn’t already known to the scientific community—p-values are notoriously easy to hack.

The problem, Aschwanden says, is that despite the bar being laughably low, many journals will readily accept a p-value of 0.05 or lower as reason enough to publish an article. There are more issues Aschwanden discusses at length, but one biggie she didn’t quite touch on is the propensity for journals to publish only positive results.