Christopher Bowns/Flickr

Researchers struggle with the ethical dilemma of using hacked data

Is it wrong to use stolen information if it serves the public good?


Dell Cameron


Published Apr 21, 2016   Updated May 26, 2021, 10:00 pm CDT

When, if ever, is it justified to use stolen data in the pursuit of academic research?

That’s the question put forth in a recent paper authored by two social scientists who found themselves in an ethical quandary over a hacked database. While the information it contained would undoubtedly enhance the researchers’ findings—the study examined challenges facing artists who rely on crowdfunding platforms, such as Kickstarter and Indiegogo—their team remained divided over the moral implications of utilizing the data.

In August, two scientists—Dr. Roei Davidson and Dr. Nathaniel Poor of the Council for Big Data, Ethics, and Society—began investigating methods for acquiring data on users of the website Patreon. Launched roughly three years ago, the site offers to transform fans into “patrons” of the arts, funding musicians, painters, and online content creators.

“We study crowdfunding as part of our research into cultural production and the changes and challenges faced by producers in the internet age,” write Davidson and Poor. They discovered, among other findings, the importance of personal networks for repeat crowdfunding, as producers are “hesitant to ask friends and family for financial support more than once.”

Worse yet, the scientists drew parallels between the Patreon hack and the Rupert Murdoch phone-hacking scandal…

In examining Patreon, the researchers initially relied on a method for extracting data known as scraping. This method typically relies on software that “crawls” through the architecture of a website, harvesting information in a mechanized fashion. Though not technically comparable in terms of scale, this approach is not unlike the way Google’s PageRank system combs the Web for backlink data, which in turn determines the importance of a given website during a keyword search.

But as luck would have it, less than a month later Patreon was hacked, the entire site was dumped online, source code and all. The scholars deemed much of the data was useful, but a great deal of it was obviously never intended for public consumption. 

Indeed, had this fortuitous crime not occurred, the researchers would have been limited to information scrounged from the uncooperative company; a “convenience sample,” as they put it.

“This was such a gift!” they wrote; only the team was unable to agree on whether or not the data was appropriate for use. Some felt that Patreon’s data was now public, and since the information they sought did not include personal information about its users, the project was OK to proceed. But others were more hesitant: After all, ethical criteria must be met in the pursuit of scholarly research.

Facing this dilemma, the researchers examined other instances involving unauthorized data leaks and hacks. In the case of Edward Snowden, for example, they determined a crime had certainly been committed, but arguably Snowden’s actions had been in service of the greater good. However, there was a clear difference between the aims of the Patreon research and, say, the articles published by journalist Glenn Greenwald, who exposed illegality in the U.S. government’s surveillance apparatus. 

Worse yet, the scientists drew parallels between the Patreon hack and the Rupert Murdoch phone-hacking scandal in the mid-to-late 2000s—unethical journalism that helped kill off the 168-year-old News of the World tabloid. 

The Daily Dot, along with almost every other news organization, routinely adopts illegally obtained data for the purpose of news gathering. In 2015, following the dump of 400 gigabytes of Hacking Team data, for example, reporters unearthed the Milan-based cybersecurity firm’s financial ties to a dictatorial regime in Sudan. When Ashley Madison was pilfered by hackers last fall, research revealed a payoff to silence allegations of sexual harassment hidden among the CEO’s emails; other records proved the company secretly operated a website offering escort services. In 2014, an exclusive leak of more 5 gigabytes of sealed court records led the Dot to report on a federal informant who had instigated cyberattacks in as many as 30 foreign countries.

Most recently, a “mega-leak” of data exfiltrated from a law firm in Panama implicated many of the world’s most influential leaders in an offshore holdings scandal. This enormous collaboration, in which hundreds of journalists invested their time, deposed the prime minister of Iceland less than 48 hours after publication.

The Daily Dot, along with almost every other news organization, routinely adopts illegally obtained data…

But there are significant differences between scholarly and journalistic research. “[T]here is widespread acceptance that journalists have some responsibility to the public good which gives them latitude for professional judgment,” the researchers noted. “Without that history, establishing a peer-group consensus and public goodwill about the right action in data science research is a challenge.”

Seeking feedback from a mailing list of fellow Internet researchers, many respondents claimed employing Patreon’s data could be construed as legitimizing criminal behavior. It was further noted that the researchers would likely come upon private information themselves while attempting to clean the data for study; having never handled this data specifically, the possibility remained that there would be trouble distinguishing harmless, but useful information from that which is considered private to users. 

In the end, the researchers decided not to use the hacked Patreon data. “Considering other cases and academic guidelines, we felt it would not be appropriate,” they wrote.

“Some cases of using data (or not) will be clear, other cases will not be,” they added. “In the spirit of making lemonade out of lemons, we hope our case highlights some of the difficulties and considerations academics may encounter when contemplating the use of data.”

Share this article
*First Published: Apr 21, 2016, 8:56 pm CDT