CAPTCHA, RECAPTCHA: Identifying Chokepoints and Turning them to Opportunities
We all get stuck. Plateaus occur everywhere in nature: in our biology, our physics, our chemistry, our companies, our marriages. Modern life, however, has distorted this naturally occurring, healthy process, and now tortures us with frustration as we try to lose weight or grow a small business. Nowhere is this phenomenon more obvious than on the Internet and in the wider world of technology. Think about all the classic “one hit wonders,” from Webvan to the electric car startup, Coda. They all run into plateau-bound limits—such as data idolotry, accidental reinforcement and step functions—and do so much faster when running on tech time. Here, we describe a famous Web choke point, and a rather infamous (but very creative) way to get around it. —Bob Sullivan and Hugh Thompson
Not everyone knows them by name, but if you use the Internet, you’ve certainly seen them. CAPTCHAs are the annoying squiggly words you need to type into a box online before you can buy a ticket to a baseball game or sign up for a new email account. It’s a test: is the person filling this thing out a human or a computer?
The modern CAPTCHA was the brainchild of Louis Von Ahn, a computer scientist at Carnegie Mellon University. CAPTCHAs have been integrated into almost every major website—including Google, Yahoo, and Ticketmaster. You may be wondering what purpose these incredibly annoying, sometimes indecipherable words really serve? They might be frustrating to read for a human, but good CAPTCHAs are darn near confounding for computers to figure out automatically.
Before CAPTCHAs, cyber criminals would write automated tools to try a thousand different passwords for your online account in seconds. After CAPTCHAs, the process got derailed. The tool would get to the part where it had to guess what the squiggly word was and then give up.
CAPTCHA was a game-changer for websites that were being inundated with spam or password guessers. But again, if we may, let’s think about it from the cyber crook’s perspective. It became the ultimate choke point.
Organized cybercrime groups had invested significant time and energy in writing tools to advertise their wares and break into accounts; they weren’t about to see all that effort wasted and revenue lost, especially to a series of letters that look like they were written by an intoxicated roller coaster rider. It is here where we see what a motivated group can do to creatively work around choke points.
The first attempts to circumnavigate CAPTCHAs were technological. Bad guys wrote a series of special-purpose Optical Character Recognition (OCR) software tools, specifically designed to help computers read CAPTCHA characters. The tools succeeded in some cases, which led CAPTCHA writers to make the letters even more unreadable.
The next approach to bring down CAPTCHAs was to pay people in low income countries to solve these things by hand. For a hundredth of a cent, you could get someone to figure out if those three letters on the screen spelled “Cat” or “Car” or “Let” or whatever. It worked, but this pay-per-solve approach actually started to get pretty expensive. Imagine trying to guess someone’s password. You might have to try 100,000 different combinations to get to the right one, that’s $10…you’d be better off buying 10 stolen credit card numbers!
And then, finally, a breakthrough. Crooks leveraged the most powerful force on the Internet. It’s the force that pushed Web browsers to start showing images. The juggernaut that made the VHS format win out over Beta Max. It’s the $100 billion dollar industry that has spawned some of the most disruptive advancements in technology and video production. Pornography.
Cybercrime gangs set up porn sites online. The service was free. You didn’t need to sign up, no membership required. All you needed to do to see the next image or video was fill out a few CAPTCHAs. It was a sordidly brilliant scheme. The CAPTCHAs were taken from the websites that their tools were actively trying to attack. Whenever one of their tools hit a CAPTCHA choke point, bam, they would take a picture of the squiggly word and cue it up to appear on the screen of the next porn site visitor. Some of these porn sites even used quality control procedures—they’d take the same CAPTCHA and present it to five different porn hunters and only trust that the typed word was correct if all five people gave the same answer. At the peak of the porn/CAPTCHA frenzy, it is estimated that several thousand CAPTCHAs were being solved by unsuspecting (and somewhat rushed) porn site users a minute. It was a choke point remover that had staying power: no matter how complicated you made the CAPTCHA, there would always be a motivated human who was willing to solve it for free.
In this battle of good versus evil, of choke point insertion and removal, CAPTCHAs continue to evolve and push the bounds of human computation. In an ironic twist, the man responsible for creating one of the biggest choke points in cybercrime, Louis von Ahn (perhaps out of guilt?) has refocused on removing choke points in a completely different area: the digitization of old books.
After CAPTCHAs started appearing everywhere, he realized that people were doing all of this wasted work figuring out these little puzzles. Could he put their efforts to better use? More than 200 million CAPTCHAs are filled out every day; that’s over 150,000 hour of labor per day wasted on those squiggles. Were their little problems that humans could solve that could be broken up into chunks but then pieced together to solve a bigger purpose? It was out of this desire that the ReCAPTCHA project was born.
Before we get to ReCAPTCHA, a bit of background on the challenges of computers reading the written or typed word. During the past several decades, computer systems have become pretty efficient at being able to recognize typed text from pristine documents. But what about old documents and books, the kind that are riddled with coffee stains, stray marks, bad typesetting? In those cases, the software starts to get less reliable and the digital content starts to look like gibberish. Ultimately, a human that must intercede, eyeball the page, and make the call, and here’s where Louis von Ahn and ReCAPTCHA come in.
Instead of completely fabricated puzzles, ReCAPTCHA presents users with two squiggly words to type in: one that the website knows the right answer to and the other that’s a picture of a word from an old book. Users don’t know which word is the bogus one and which one is correct, so you are incentivized to give your best guess on both. ReCaptcha then gathers and cross checks your interpretation of the word that was scanned in from the old book, and if enough people agree, voila, you’ve just help to preserve history by turning an old masterpiece into a digital document that will live on forever.
ReCAPTCHA went a step further. Instead of just looking at a contorted word, the software gives you the option to listen to some slightly garbled audio and then type the words you hear. This makes the CAPTCHA concept workable for the visually impaired, but through the lens of ReCAPTCHA, it also created an opportunity to turn old recordings into transcribed documents.
ReCAPTCHA was disruptive. It removed the choke point in digitizing less than pristine documents and recordings. ReCAPTCHA now serves up more than 30 million puzzles a day and has millions of digitized old books and documents under its belt. It quickly captured the attention of Google – a key player in the book digitization game – and in 2009 the internet juggernaut acquired ReCAPTCHA.
A choke point is the part of the system that breaks first and slows everything else down. Failing to identify a chokepoint can bring a gushing flow to an unexpected trickle. When you hit a choke point, the whole system can slow or even stop. It’s the equivalent of tripping a circuit breaker by plugging in one too many strands of Christmas tree lights. You’re experiencing the Plateau Effect in full force.
Therefore choke point removal is transformative. The key is to find out where the choke point (and plateau) is and creatively route your way around and over it. This will not only get you over your plateaus, but help you prevent future ones.
Excerpt from THE PLATEAU EFFECT © 2013 by Bob Sullivan and Hugh Thompson. Published by Dutton, A Member of Penguin Group (USA) Inc. Excerpted with permission from the publisher. All Rights Reserved.
Bob Sullivan and Herbert Thompson are the authors of The Plateau Effect: Getting from Stuck to Success. With more than 40 years of experience between them researching, writing, and analyzing systems and human nature, their new book helps you bust through the plateaus in your own life.
Stories from the New Frontier: The Web is bursting with new ways of reading and writing. Publishing is changing—from what people want to read, to how they want to read it. The rise of the e-book, new media tools, and new communities of readers and writers are transforming the very way we tell stories. This series features excerpts from the work of digitally self- published authors, e-book authors, or from new books that look at Internet culture in order to give a taste of the new frontier of literature in the digital age.
Photo by Wonderlane/Flickr