CAPTCHA

From Free net encyclopedia

A CAPTCHA (a backronym for "completely automated public Turing test to tell computers and humans apart", trademarked by Carnegie Mellon University) is a type of challenge-response test used in computing to determine whether or not the user is human. The term was coined in 2000 by Luis von Ahn, Manuel Blum, and Nicholas J. Hopper of Carnegie Mellon University, and John Langford of IBM. A common type of CAPTCHA requires that the user type the letters of a distorted image, sometimes with the addition of an obscured sequence of letters or digits that appears on the screen. Because the test is administered by a computer, in contrast to the standard Turing test that is administered by a human, a CAPTCHA is sometimes described as a reverse Turing test. This term, however, is misleading because it could also mean a Turing test in which the participants are both attempting to prove they are the computer.

Image:Captcha.jpg

Contents

Origin

Since the early days of the Internet, users have wanted to make text illegible to computers. The first such people were hackers, posting about sensitive topics to online forums they thought were being automatically monitored for keywords. To circumvent such filters, they would replace a word with look-alike characters. HELLO could become |-|3|_|_(), )-(3££0, and numerous other variants that a filter could not possibly all detect. This later became known as 13375p34k.

Primitive CAPTCHAs seem to have been first developed in 1997 at AltaVista by Andrei Broder and his colleagues in order to prevent bots from adding URLs to their search engine. Looking for a way to make their images resistant to OCR attack, the team looked at the manual to their Brother scanner, which had recommendations for improving OCR's results (similar typefaces, plain backgrounds, etc.). The team created puzzles by attempting to simulate what the manual claimed would cause bad OCR recognition. In 2000, von Ahn and Blum developed and publicized the notion of a CAPTCHA, which included any program that can distinguish humans from computers. They invented multiple examples of CAPTCHAs, including the first CAPTCHAs to be widely used (at Yahoo!).

Applications

CAPTCHAs are used to prevent bots from using various types of computing services. Applications include preventing bots from taking part in online polls, registering for free email accounts (which may then be used to send spam), and, more recently, preventing bot-generated spam by requiring that the (unrecognized) sender pass a CAPTCHA test before the email message is delivered.

Characteristics

CAPTCHAs are by definition fully automated, requiring little human maintenance or intervention in administering the test. This has obvious benefits in cost and reliability.

The algorithm used to create the CAPTCHA is often made public, though it may be covered by a patent. This is done to demonstrate that breaking it requires the solution of a hard problem in the field of artificial intelligence (AI) rather than just the discovery of the (secret) algorithm, which could be obtained through reverse engineering or other means.

Accessibility

CAPTCHAs based on reading text — or other visual-perception tasks — prevent visually impaired users from accessing the protected resource. However, CAPTCHAs do not have to be visual. Any hard artificial intelligence problem, such as speech recognition, can be used as the basis of a CAPTCHA. Some implementations of CAPTCHAs permit users to opt for an audio CAPTCHA.

The development of audio CAPTCHAs appears to have lagged behind that of visual CAPTCHAs, however, and presently may not be as effective. Other kinds of challenges, such as those that require understanding the meaning of some text (e.g., a logic puzzle, trivia question, or instructions on how to create a password) can also be used as a CAPTCHA. Again, there is little research into their resistance against countermeasures.

While providing an audio CAPTCHA allows blind users to read the text, it still excludes those who are both visually and hearing impaired (according to sense.org.uk, about 4% of people over 60 in the UK have both vision and hearing impairments; there are about 23,000 people in the UK who have serious vision and hearing impairments; according to The National Technical Assistance Consortium for Children and Young Adults Who Are Deaf-Blind (NTAC), there were 9,516 deafblind children in the USA in 2004; Gallaudet University quotes a 1993 estimate of 35,000 fully deafblind adults in the USA; deafblind population estimates depend heavily on the degree of impairment used in the definition).

The use of CAPTCHA thus excludes a large number of individuals from using significant subsets of such common Web-based services as PayPal, GMail, Orkut, Yahoo, many forum and weblog systems, etc.

For non-sighted users (for example blind users, or the color blind on a color-using test), visual CAPTCHAs present serious problems. Because CAPTCHAs are designed to be unreadable by machines, common assistive technology tools such as screen readers cannot interpret them. Since sites may use CAPTCHAs as part of the initial registration process, or even every login, this challenge can completely block access. In certain jurisdictions, site owners could become target of litigation if they are using CAPTCHAs that discriminate against certain people with disabilities. In other cases, those with sight difficulties can choose to identify a word being read to them.

Even for perfectly sighted individuals, new generations of CAPTCHAs, designed to overcome sophisticated recognition software, can be very hard or impossible to read. Even some of the demo CAPTCHAs at the software sites listed below are indecipherable to many if not all humans.

The W3C paper Inaccessibility of Visually-Oriented Anti-Robot Tests outlined some of the accessibility problems with CAPTCHAs.

Circumvention

It may be possible to subvert CAPTCHAs by relaying them to a sweatshop of human operators who are employed to decode CAPTCHAs. The W3C paper linked below states that such an operator "could easily verify hundreds of them each hour". Nonetheless, some have suggested that this would still not be economically viable. (e.g. [1])

Mori et al. published a paper in IEEE CVPR'03 detailing a method for defeating one of the most popular CAPTCHAs, EZ-Gimpy, which was tested as being 92% accurate. The same method was also shown to defeat the more complex and less-widely deployed Gimpy program with an accuracy of 33%. However, the existence of implementations of their algorithm in actual use is indeterminate at this time.

Automated attacks on CAPTCHAs are also growing more sophisticated. Projects like PWNtcha have made significant progress in defeating commonly used CAPTCHAs, which has contributed to a general migration towards more sophisticated CAPTCHAs.

This Microsoft research paper - http://www.ceas.cc/papers-2005/160.pdf - found in 2005 that computers are now better than humans at solving CAPTCHAs based on alphabetic characters.

Neural networks have been used with great success to defeat CAPTCHAs as they generally are indifferent to both affine and non-linear transformations. As they learn by example rather than through explicit coding, with appropriate tools very limited technical knowledge is required to defeat more complex CAPTCHAs.

There is also a way to circumvent some poorly designed CAPTCHA protection systems without using OCR simply by re-using the session ID of a known CAPTCHA image. See the article on puremango.co.uk for detailed information about this type of attack.

Sometimes, if part of the software generating the CAPTCHA is client-sided (the validation is done on a server but the text that the user is required to identify is rendered on the client side), then users can modify the client to display the unrendered text, etc.

External links

CAPTCHA implementations

Classic ASP

C

  • Obfuscated Image[2], a GPL'ed implementation of CAPTCHA system.

ColdFusion

Java

.NET

Perl

PHP

  • Image Verification, a PHP + GD implementation.
  • Auditor, yet another PHP + GD implementation.
  • tacs, and yet another PHP + GD implementation.
  • QuickCaptcha, one more PHP + GD implementation.
  • freeCap 1.4, PHP + GD implementation with multiple fonts and hammering protection, distributed under the GPL
  • Forms generation and validation class with a plug-in to implement a CAPTCHA validation input
  • PEAR's Text_CAPTCHA, a PHP implementation.
  • tEABAG_3D CAPTCHA, by OCR Research Team. 3D CAPTCHAs using PHP4 + GD. (free but only on a condition)
  • GOTCHA!, yet another CAPTCHA using PHP4 + GD.
  • CaptchaPHP, data: URLs, readable riddle in alt= for visually impaired users and not OCR-safe
  • Block AutoSubmit, Implementation that works with newer and older GD down to version 1.63.
  • PhpCaptcha, PHP + GD implementation with multiple TrueType fonts and support for random background images. Also includes an audio CAPTCHA for visually impaired users.
  • CAPTCHA, PHP + GD implementation for Drupal CMS.

Python

Ruby

Smalltalk

  • SW2Captcha, a Smalltalk implementation using Morphic.

CAPTCHA services

  • captchas.net, Free CAPTCHA Service (image and audio), sample code in php, python, perl and asp
  • captchaservice.org, Free CAPTCHA Service (Images: words, random letters, user-specified strings; Text: odd words out, number puzzle). captchaservice.org is a REST-style web service that serves XML in response to GET request URLs. A request will return an XML document containing a test word string, along with a URL to a distorted image of that string.
  • address-protector.com, A way to avoid email spam; you give it your email address, it gives you a link which when clicked will show a CAPTCHA which when 'passed' will show your email address.
  • Anti-spam email address encoder based on a CAPTCHA system
  • www.cerospam.com.ar, Free CAPTCHA Service (image), including forwarding of form's content to your e-mail if you do not have a script to process it.

Defeating CAPTCHAs

de:Captcha es:Captcha eo:Captcha fr:Captcha it:Captcha nl:Captcha ja:Captcha sv:Captcha tr:CAPTCHA