Human After All
23/01/2008
You will almost certainly have solved one of these in the last month, if not dozens:

The uptake of CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) has been swift and widespread, as they offer a straightforward and directly effective method of determining whether the user is a human or a computer. Their success rests on discovering something humans are extremely good at, and computers are extremely bad at - interpreting the content of images. Until computers get better at doing this, spammers will continue struggling to overcome the CAPTHCA barrier without having to somehow involve organic brain power.
It takes, on average, 10 seconds for a person to solve one before they continue submitting whatever they are submitting. Just quickly enough for it not to become too disruptive or inconvenient, so long as you don’t have to do too many. But what may only be 30 seconds a week for you, is actually 1,050,000 hours a week for mankind. The inventors, Luis von Ahn, Manuel Blum, Nicholas J. Hopper and John Langford, were worried about how much time their creation had begun wasting. Just think what could be done if a year of CAPTCHA-solving time was put to some other use, they recently thought….
Then they had a brilliant idea.
The benefit of computing limitations for spam-stopping is also a frustration for the good honest people who want to digitize the heaps and heaps of written text we cannot yet search, download or cut and paste from. OCR (Optical Character Recognition) is used to automatically render printed text as machine-editable text, and has been put to work scanning the millions and millions of pages we want to make digital. But, as Stuart will tell you from his recent experience, it’s a highly imprecise method, returning a mixture of well-rendered passages and absolute nonsense.

Enter the reCAPTHCA Project, turning 150,000 hours of wasted human labour into something dazzlingly efficient and useful. The words that computers can’t figure out when digitising texts become the CAPTCHAs, and we do what we are good at - interpret images. But wait! How will it work as a spam-filter if the makers don’t yet know what the words are?! You translate 2 words - one a designed CAPTCHA, the other a word OCR mangled somewhere. If you get the CAPATCHA right, your answer to the mangled word is returned as being correct. The same word is then given to a few other users to improve the accuracy of the result allowing everyone to help digitise the world’s written history, something otherwise predicted to take 400 years.
Well done, aren’t you clever?


