How to wipe out the Internet plagiarism epidemic

For the past few months a pair of anonymous sleuths have waged a full-frontal assault on Internet journalism.

On their blog Our Bad Media, a pair of writers going by the handles Blippoblappo and Crushingbort, have embarked on a mission to expose the web’s worst serial plagiarists.

Their first target was BuzzFeed ?viral politics” writer Benny Johnson. After Johnson called out another publication for plagiarizing a story he wrote about presidential sock choices (really), the bloggers looked into the author’s own long record of claiming other people’s work as his own.

Slamming Johnson’s attacks as hypocritical, Blippoblappo and Crushingbort documented myriad instances of Johnson stealing the work of other writers wholesale. BuzzFeed, an online behemoth that’s consistently had to fend off accusations that its primary business model is monetizing the work other people did for free and then put on the Internet, initially defended Johnson, but soon relented. Johnson was summarily fired and BuzzFeed Editor-in-Chief Ben Smith made a public apology and provided links to all 41 stories Johnson wrote that an internal investigation showed to have been plagiarized:

This plagiarism is a breach of our fundamental responsibility to be honest with you—in this case, about who wrote the words on our site. Plagiarism, much less copying unchecked facts from Wikipedia or other sources, is an act of disrespect to the reader. We are deeply embarrassed and sorry to have misled you.

After doing the mature thing and owning up to its editorial oversight, BuzzFeed then attempted to secretly remove over 4,000 posts from its website.

While BuzzFeed founder Jonah Peretti claimed the deleted article were full of jokes that ?didn’t age well,” Gawker (which was the first to notice BuzzFeed’s vanishing act) charged that a significant number of the disappeared articles contained flagrant examples of plagiarism.

The implication, of course, is that plagiarism at BuzzFeed was rampant in its early days and now the site, which recently received a $50 million cash infusion from Silicon Valley über-venture capitalists Andreessen Horowitz, is attempting to clean up its image by not only starting to adhere to traditional journalistic standards, but also attempt to pretend those early, Wild-West days never happened.

Thanks to the sleuths at Gawker and Our Bad Media, those efforts were unearthed just as effectively the plagiarism that, many claim, was once endemic across the site.

Resting at the core of Our Bad Media’s quest to rid the world lazy, thieving hacks is a fundamental paradox about the way journalism operates in the Internet age. Stealing the work another author and claiming it as your own, intentionally or not, is something that’s never been easier to do; yet, at the same time, it’s never been easier to get caught red-handed. Online plagiarism may be something baked right into the Internet’s very DNA, but the Internet also presents a pair of solutions to the problem. One option will cost online news sites a good bit of time and money, while the other may cost them the only thing that really matters in an era when the barriers to creating a new media empire are virtually nil—the trust of their readers.

Our Bad Media didn’t stop with Johnson and BuzzFeed. After triggering BuzzFeed to try to dump its embarrassing early days down the memory hole, the bloggers went after an infinitely more establishment media figure: bestselling author and CNN host Fareed Zakaria.

Zakaria is one of the most prominent public intellectuals on U.S. foreign policy. He was a columnist for Newsweek and an editor-at-large for Time. During the 2008 presidential campaign, then-candidate Barack Obama was spotted reading Zakaria’s book, The Post-American World.

An Our Bad Media blog post detailed 12 incidents of works produced by Zakaria that looked eerily similar, often matching word-for-word, articles originally appearing in Businessweek, Bloomberg, and—yes— Wikipedia. Zakaria, and his editors, have pushed back against the allegations, the evidence is right there for anyone to see—and it’s not pretty.

If media observers believed widespread plagiarism was a problem for new media, digital-native publications like BuzzFeed, whose millennial writers and editors (as the stereotype goes) were never schooled in the august tradition of journalistic ethics, the effectiveness of Our Bad Media’s crusade against Zakaria proves the problem is much deeper than some people at BuzzFeed not having gone to journalism school.

Neither Blippoblappo nor Crushingbort, both of whom have remained steadfastly anonymous, responded to a request for comment, but it seems like their actions point to something really important about the way journalism is conducted in the Internet age.

Modern journalists are often expected to turn out a lot of content or else risk being sucked into the deadly blades of the aggregation turbine. Before the Internet made a virtually infinite amount of human knowledge available at the touch of a button, churning out that much content would have been impossible. Going down to the library to search through old news articles or hoping to catch a specific academic expert at their desk for a phone call is far more time-consuming than finding the article with a quick Google search or looking up the academic’s most recent work on Google Scholar.

As a result, journalists are required to become vacuums for similarly formatted information, sucking up vast quantities of data and then regurgitating only the most relevant bits. In the process, the sourcing for that information can get lost. Sometimes it’s unintentional, like finding an interesting quote and simply forgetting to insert a link; on the other hand, sometimes it isn’t.

There’s an episode in the second season of Louie where comedian Dane Cook confronts fellow comedian Louis C.K. about widespread allegations that Cook had stolen some of C.K.’s jokes. Cook wanted C.K., who came to him seeking an unrelated favor, to release a YouTube video coming to Cook’s defense, attesting that the never stole the jokes. Louie, being Louie, refused.

He explained:

You wanna know what I think? I don’t think that you saw me do those jokes and said I’m going to tell those jokes, too. I don’t think there’s a world where you’re that stupid. Or that bad a guy. I do think, though, that you’re like —you’re like a machine of success. You’re like a rocket. You’re rocketing to the stars, and your engines are sucking stuff up. Stuff is getting sucked up in your engines, like birds and bugs and some of my jokes. I think you saw me do them. I know you saw me do them, and I think they just went in your brain, and I don’t think you meant to do it, but I don’t think you stopped yourself, either.

This is basically the risk of doing journalism primarily using the Internet for research. Creative types attempting to pass off others’ material as their own isn’t new; it’s something that goes back to the day after a caveman drew the first halfway-decent cave painting. However, writers now put themselves at the risk of stealing other people’s material in a way that feels infinitely more natural and organic than what was possible just a generation ago.

There are two ways to deal with this. The first is to effectively crowdsource plagiarism checks, which is essentially what happens now.

Most online publications don’t have specific processes in place for editors to check that the work of their journalists isn’t plagiarized. Instead, they rely on a system of waiting for an outside agent to make an accusation of plagiarism, which they then seriously investigate and deal with as needed—people get fired, apologies are made, rival publications write think pieces, life moves on. Or, what likely occurs in most cases, the original author or publication contacts the plagiarizing author and asks for a link or citation, which is immediately granted along with a hearty apology and an insistence that it was an honest mistake that will never happen again.

Our Bad Media aren’t the only people out there looking for this stuff. For example, there’s an entire network of anonymous network of plagiarism-hunters in Germany that have taken people like country’s defense minister and other top politicians. All the materials necessary to catch a plagiarist are just a Google search away, it just takes someone with enough time on their hands to plug in just the right passage into a search box and hit enter.

For publications, this system has the benefit of being both free and relatively easy—keeping everything out of sight and out of mind until there’s a fire that can be put out with a now-predictable script. Even so, it’s a system that Chris Harrick, vice president of marketing at plagiarism detection software manufacturer TurnItIn calls deeply counterproductive.

?[News organizations] look at plagiarism as a very infrequent occurrence that’s dealt with swiftly and judiciously when it’s caught,” Harrick explained. ?They fire the person who did it and issue an apology. They see it just as part of the cost of doing business. I think that view is a little antiquated.”

The problem, he argued, is that ten years ago, BuzzFeed didn’t exist. The Internet has let new sites come onto the market and compete instantly on an international scale with established players like the Newsweek or the Washington Post. In an environment where the number of ink-stained dead trees a piece of writing is printed on has almost nothing to do with how many people have the ability to read it, all a journalistic outlet really has to differentiate itself is the credibility of its content. The Benny Johnson scrape may not end up hurting BuzzFeed’s web traffic in the long term, but a flood of Benny Johnsons certainly might.

The second option—of which Harrick is, unsurprisingly, a fan—is for media outlets to build plagiarism detection directly into their workflow. TurnItIn has a piece of software called iThenticate that lets someone enter in a document or chunk of text and, within seconds, spit out a report detailing which sections match closely with work previously out there either on the web or in iThenticate’s database of already published materials.

While the software is already in wide use, checking somewhere between 4,000 and 5,000 pieces a day, the most common application is in academia. Professors, teaching assistants, and academic journal editors plug content into iThenticate on a daily basis to guard against letting someone slip by with duplicative material.

Harrick noted that the company has looked into journalistic applications and thinks it could be a useful tool in editors’ arsenals; however, it has yet to catch on. ?The general feeling we get from journalists is that, considering all the financial pressure they’re under, it’s difficult for them to afford another piece of software,” he said. ?Not only is it expensive, but it also adds another step into editors’ workflow.”

A ballpark estimate of the annual cost of iThenticate for an outlet putting out about 100 stores a day, most of which are under 2,000 words, is about $18,000 to $20,000 per year. There are about a dozen news and content organizations currently subscribed to the service, but adoption hasn’t been wider because even that level of cost could be prohibitive.

$20,000 is significantly more than half the salary of an entry level, who could be churning out content and earning revenue instead of using that money to guard against a rare theoretical reputational black eye that may never occur may seem like a waste of money.

However, catching plagiarism early, before it’s put out there on the web, may be the best defense against having to sneakily take it down later, hoping against hope, that nobody notices. The trouble is, this being the Internet, someone probably will.

Illustration by Jason Reed

How to wipe out the Internet plagiarism epidemic

The problem needs to be stopped before it continues to spread.

Aaron Sankin