It's not uncommon for researchers to take huge blocks of data and share their findings, but a group who released a data set from OkCupid committed a cardinal sin by failing to ask permission first.
Between November 2014 to March 2015, Danish researchers collected personally identifiable information from users of the dating site—including usernames, ages, gender, religion, personality traits, and answers to personal questions asked by the site to refine its matches. All of that data was gathered without contacting OkCupid or its users.
The researchers scraped the information from OkCupid and uploaded it to the Open Science Framework, an online forum for social scientists to share raw data for the purposes of collaboration and transparency.
The value of the data is clear; the researchers indicated that "because users often answer hundreds if not thousands of questions," there is a great amount of information that can be gleaned from it. But the risk to the user is equally as great.
According to Scott B. Weingart, a digital humanities specialist at Carnegie Mellon University, the information shared by the Danish researchers failed to remove the usernames from the data. "If your OkC username is one you've used anywhere else, I now know your sexual preferences & kinks, your answers to thousands of questions," he wrote on Twitter.
OkCupid doesn't require users to provide a real name, so users are spared from being directly linked. But with the wealth of information gathered about each user, identifying them is just a matter of moderate effort. Weingart suggested that he could likely uncover the real names of around 10,000 users with the data set.
The American Psychological Association offers an ethics code for social science research that indicates that participants in studies have the right to informed consent. The researchers are expected to make participants aware of what data will be used and that they have the right withdraw from the research. That didn't occur at any point in the process of the study in question.
In their paper, the researchers acknowledge that "some may object to the ethics of gathering and releasing this data." However, they argue "all the data found in the dataset are or were already publicly available," and releasing it simply puts it in a more usable format. They even admit in another paper detailing the scraping process that the only reason they didn't take user images is because it "would have taken up a lot of hard drive space."
OkCupid doesn't appear at all thrilled by the actions of the researchers. A spokesperson for the service told the Daily Dot, "This is a clear violation of our terms of service – and the Computer Fraud and Abuse Act – and we’re exploring legal options."
As OkCupid explores its options, other people are exploring the data; it has been downloaded nearly 1,000 times, and people have already started to analyze it.