- A lonely grandma sought family to spend Christmas with on Craigslist Saturday 5:45 PM
- Airbnb bans white supremacists tied to Iron March forum Saturday 5:07 PM
- Did a Twitter user really get tricked into naming baby ‘Jack Ingof’? Saturday 4:46 PM
- State of emergency declared in New Orleans following ‘cyberattack’ Saturday 4:12 PM
- Video shows boy getting beat up–mom says it’s because he wore MAGA hat Saturday 3:54 PM
- Billboard changing albums chart to count YouTube streams Saturday 2:43 PM
- TikTok’s 20 most popular songs of 2019 Saturday 2:14 PM
- Greek gods memes are flooding Reddit thanks to TV reboot rumors Saturday 1:47 PM
- Anti-impeachment protesters aimlessly fumble through halls of Congress Saturday 12:54 PM
- Everything we know so far about the Xbox Series X Saturday 12:17 PM
- ASMR YouTuber Life with MaK says she was branded a ‘Nazi’ by online smear campaign Saturday 10:46 AM
- Voters duped by fake ex-Bloomberg intern’s tweet about being fired Saturday 9:47 AM
- HBO’s ‘Watchmen’ and the fantasy of competence Saturday 8:00 AM
- Cómo ver Kamaru Usman vs. Colby Covington en el UFC 245 Saturday 7:00 AM
- ‘Penis fish’ memes erupt after worms wash up on California coast Friday 5:58 PM
How much of your history does Wikipedia track?
Wikipedia’s data retention guidelines come with a loophole that allows them to keep data about you indefinitely.
Remember that Wikipedia wormhole you fell down the other night? Maybe it started with you looking at a list of every single episode of Glee and somehow ended with reading the biographical entry on John Wayne Gacy. You might not remember, and you might not want anyone else to know about it. But your history might still be on Wikipedia’s servers.
The free online encyclopedia is tracking the viewing patterns of some, but not all, of its users. And although the records are to be kept no longer than 90 days, it may retain an altered version of that data indefinitely.
Earlier this year, the Wikimedia Foundation—the non-profit organization that backs the user-edited, online encyclopedia—issued new data retention guidelines to let users know what kind of information they are tracking. Like many Web entities, the WMF has turned to transparency in order to respond to the growing public backlash against big data.
“The Foundation’s overall aim is to retain the minimum amount of information necessary in order to support the needs of the Foundation and the wider Wikimedia movement,” said Jay Walsh, a WMF spokesman.
In supporting “the needs of the Foundation,” Wikimedia automatically collects some personal information from visitors—even if they aren’t logged into a Wikipedia account. Such personal information includes visitors’ IP addresses and other data that “could be used to personally identify you.”
However, the policy states that all personal information will be kept for a maximum of 90 days, before it is either “deleted, aggregated, or anonymized.” In aggregating data, user information is combined with other data to illustrate broader trends, while anonymizing data removes the parts of the information that can identify particular users. In both cases, this allows WMF to keep data for more than 90 days. But the foundation admits that neither of these two processes can “completely eliminate the risk of re-identification.”
And this is the part that concerns some users, like Wikipedian Wnt.
“[A]ccording to the policy, not only do they retain it 90 days, but they then can retain it indefinitely by ‘anonymizing’ the IP addresses by ‘encrypting’ the ‘most specific’ part of the IP address, a process which they admit may not actually protect identity,” Wnt wrote during a recent exchange with Wikipedia cofounder Jimmy Wales on his talk page.
Wnt is concerned that the anonymizing/aggregating policy is leaving a loophole for Wikipedia to permanently retain data that is vulnerable to hackers or possible subpoena by law enforcement. He argues that even after data has been anonymized, it would be possible for technically well-equipped individuals to reconstruct IP codes and identify users.
“With these records acknowledged and their existence legitimized, there is no reason why they can’t start filing papers, cracking codes, and lining up access dates for whatever reasons they may have.”
In responding to Wnt’s concerns, Jimmy Wales noted that only a fraction of user actions—about 1 out of every 1,000—are randomly selected for tracking. He also pointsedout that 90 days is merely a maximum, and that most personal data is processed in a couple of days.
The Wikimedia Foundation would not reveal to the Daily Dot, however, what percentage of this data is actually deleted and what percentage is anonymized or aggregated and kept past 90 days.
Wikimedia was also tight-lipped about how often they are compelled to turn over information to law enforcement, with Walsh saying that “the Foundation complies with legitimate and lawful requests from enforcement agencies when it is necessary to do so.”
In April, Wikimedia published guidelines pertaining to requests for user information. In it, WMF states that information request are “relatively rare” and that each one is handled with discretion. The Foundation also has a policy of notifying users of a request for their information before it’s given out, but sometimes they forego that notification at the behest of law enforcement.
When it comes to the question of whether Wikimedia should be tracking user information at all, Walsh defended the practice, saying it was important for the overall growth of the site.
“The data the Foundation collects is critical in the development of new products that help them reach a wider international audience of readers, and to make it easier and more rewarding for people to begin contributing to the projects,” he said.
Photo Cary Bass/Flickr (CC BY-SA 2.0)
Tim Sampson is a reporter who focused on the technology, business, and politics beats. He's also an established comedy writer, with work on Comedy Central and in The Onion and ClickHole.