Many researchers don't share their raw data like they're supposed to

Science has a reproducibility problem. In order to confirm hypotheses, researchers need to test and retest them. But recently, some researchers discovered that most social science studies couldn’t be replicated—even the really big, foundational ones.

Now, researchers are discovering that scientists rarely even make their raw data freely available to their colleagues or the public.

In an open-access study published in the journal PLoS One on Dec. 4, data librarian Ryan Womack looked at original research articles (articles that publish findings from original data) in the top 10 journals in biology, chemistry, physics, and math in 2014. Womack found that, overall, only 13 percent of the articles in those four core fields offered their raw data for free.

Theoretically, researchers are supposed to make their data available either automatically or upon request in the spirit of reproducibility, letting scientists check each other’s work and build upon it. A fresh set of eyes might see the data in a new way or find something the original researchers didn’t notice. But the reality, apparently, is that data isn’t as freely available as it should be.

Some journals require scientists to provide their raw data up front in order to publish, but they usually don’t enforce this requirement, Womack wrote in his article. Several online data repositories exist, but researchers don’t seem to make use of them.

Womack found some differences between disciplines. Biological sciences, for instance, had the highest rate of data sharing, but it was still only 42.9 percent. In contrast, only one of the physics articles made its data freely available. (Four made it available by other means.)

“I didn’t find the results surprising,” Ivan Oransky, founder of academic publishing watchdog Retraction Watch, told the Daily Dot in an email. “We frequently hear researchers complain that not only do many authors fail to automatically make their raw data available, many even fail to make it available upon request.”

The reason, Oransky said, is that researchers, like journalists, are competitive and don’t want to get scooped. “That’s understandable, really, given that the system rewards new big papers instead of collaboration.”

Womack’s study analyzed only top-tier journals, judged by their impact factor, a metric by which researchers and institutions can judge a study’s importance. Articles in journals with high impact factors are often perceived as being better and more important and containing the most exciting results.

But Randy Schekman, founder of the online journal eLife, previously told the Daily Dot that the impact factor is a faulty measure of clout. Librarians originally developed it to help them decide which journal subscriptions to buy. But journals learned to hack the impact factor, Schekman said—publishing more review articles and cherry-picking original research articles that would get more citations.

Publishing in high-impact journals also benefits researchers directly. The more high-impact papers on their curriculum vitae, the better their chances of scoring a tenure-track research position.

Womack, the author of the concerning new study, said that he saw a push to reverse the trend and offer more open data.

“Policies and practices are gradually tilting the balance towards openness, so I believe data sharing will increase,” Womack told the Daily Dot. “Hopefully other studies will demonstrate that over time in different disciplines.”

Illustration by Jason Reed

Many researchers don’t share their raw data like they’re supposed to

Even though they should.

Cynthia McKelvey