15 June, 2012

Gotcha CAPTCHA?

A CAPTCHA is an anti-spam program that generates tests that humans can pass, but computers can’t. CAPTCHA is an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart originally coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford of Carnegie Mellon University. Captcha is typically used where the application needs to know that it’s a human on the other side and not an automated script or another computer(which can’t think for itself).

Imagine going to the public library and picking a century old book. If this book has to be converted to an e-book, the computer must be capable of reading the book. A machine unlike a human cannot identify a real bug, some dirt collected over the years, missing letters, wrongly spelled words and others. i.e., computers cannot read distorted text. This is the funda behind Captchas.

Types of CAPTCHA

Text

Human is challenged with a text captcha with some amount of back ground noise.


Image

An image with or without text is presented to the user and a question is asked based on it. For E.g., “Which is the bird presented in the picture below?”. This captcha could even take the form of a graphical puzzle which the user must solve.


Q and A

User is presented with a question like “How many hands does a crow have?”. If this type of captcha has a limited set of questions and answers, it could be broken easily.

Math puzzle

A mathematical puzzle such as 2+3=? is presented to the user who is expected to complete the math problem.

 

Game captcha


 

Why do we need CAPTCHA?

  • To prevent spammers from creating fake accounts on websites [Registration page]
  • To prevent unauthorized users from accessing features that help hack email accounts and/or spam user’s inbox [Forgot Password’]
  • To prevent automated software from participating in online activities by impersonating a human [Online polls/surveys]
  • To prevent spammers from commenting on websites [Comment forms] 

 

Bypassing CAPTCHAS

Bypassing Humans

Hire humans who can enter values in captcha fields when needed. This means that a script can be written to register or spam on the web.

Automated scripts

A spammer patiently downloads all the captchas on the website over a period of time and builds a database. He could in turn write a script to compare captcha images and instruct the script to key in appropriate values into the captcha fields based on comparison operation. These are situations where Q and A and puzzle related captchas come in handy to considerable extent. However, it’s important to note that these captchas could be programmable as well. 
Using Free OCR, spammers can decode text based captchas. Spammers can use one such tool to build a robust captcha database and use them to bypass captchas.

Using session ids

Most captchas may not destroy the session ids when correct captchas are keyed in. These session id’s can be exploited with the help of a few lines of code and bypassed easily.

Eliminating the CAPTCHA element

If captchas are validated at client side and not on the server, users can remove or eliminate the captcha element using an add-on like Fire bug or knocking off the captcha code in HTML source code. Easy bait, isn't it?

CAPTCHAs and Usability

* These days many websites present captchas that are difficult for humans to decode, forget about machines :-). Background noise in captcha (anything other than the text that makes it difficult to read the text) must be optimal enough making it easier for humans and difficult for computers.
* If users are unable to recognize a particular captcha, there needs to be a provision to re-generate a new captcha. In general, every page refresh displays a new captcha to the user. 
* Providing audio captcha supplements captcha functionality by allowing the user to listen to the audio in case he cannot decrypt the captcha. 
* Suppose a user is on a registration page and has already entered captcha information. For some reason, form submission failed. Here, user is presented with a new captcha for a second time. If the user has already proved that he is a human by keying in captcha data, why present it repeatedly to the user. 
* Look and feel of captcha elements with respect to the web page background color and images is important. Keeping captchas at the end of the page where they are hardly noticeable becomes a problem for users who realise that there is a captcha after page validation fails on that page. 


There are some applications where the same captcha is presented even if user enters wrong data instead of presenting a new captcha post page refresh (LOL).


 CAPTCHA and Accessibility

* Displaying captcha with a lot of background noise becomes a problem for differently-abled users. For e.g. a visually impaired user cannot see what’s on the screen. It needs to be read out loud. This requires audio support and hence the need for an audio captcha.
* Accessibility functions need to be built into the web application for screen reader tools to read captcha elements and invoke an audio captcha.
* Few applications display captchas that are hard to decode by humans themselves. This often poses an accessibility problem for all groups of users. 
* Dyslexic people could have problems with captchas too [Just wondering]

CAPTCHA and Security

A small database of captchas is easy to collect and crack. If a website displays about 20 captchas on a website in a random fashion, a regular user on that website can figure out all of those, write a little script and crack them. Below is a snippet from one of my blog posts on how absences of captcha can impact security.

* No Captcha on Registration forms
This is a wholesome option. I recently needed 50 valid email accounts to be created for testing a website. All I did is write a simple automated script using iMacros (FREE add-on) for account registration and creation. All I had to do is activate these email accounts manually (note that this step could have been automated too). At the end of the testing effort, these accounts were discarded. Now, if you are a company that allowed 50 email accounts for a single imposter, you lose an awful lot of revenue. Is this what you want? If you had a captcha in place, my script would have failed as captcha expects different data at each times which needs human intervention. Building a captcha on registration forms is a good design idea to snub away not-so-serious users or spammers.

* No Captcha on Forgot Password forms
If there is no captcha on Forgot Password form, I would possibly write a script to feed in umpteen number of valid email addresses to the Forgot Password page. Why would any user do that? He could be a cranky user. He might draw fun in irritating fellow users. He might be an unethical hacker. He doesn’t know what to do with his life!

* No Captcha on Comments forms on websites and blogs
As a blogger, I get a lot of spam comments for products that I don’t need.. I wish spammers took segmentation and targeting seriously and routed their ads to appropriate audience :-). Without a captcha in place, spammers can write easy little scripts to post these *free ads* in the comments section of any website. Having a captcha would require human intervention which in turn might block spam to a reasonable extent. 


Addendum on 21st June 2012
Added Game captcha. 


Regards,
Parimala Hariprasad (yeah, changed my surname. Do remember ;))



4 comments:

Anonymous said...

It's going to be finish of mine day, but before end I am reading this great paragraph to increase my know-how.
Look at my blog post ; buy proxies

Santhosh Tuppad said...

Parimala, Good to see you writing about captcha. Now, there are different contexts. And I would want to put that as a question rather than I giving away the answer.

You are testing a web application and registration web page doesn't have captcha. When you go to customer, he / she says, we are okay without it. Ultimately, my question is: When are the different contexts in which customers might want to use CAPTCHA? Could you list them down? I hope you got my question.

-- Santhosh Tuppad
http://tuppad.com/blog/

Parimala Hariprasad said...

@Everyone,

Do you see the first comment above. It's a spam that's got through my comments form due to the absence of a CAPTCHA. Live demonstration, you see ;)

@Santhosh Tuppad
Thank you for the question. My response is on the way.

Regards,
Parimala Hariprasad

Parimala Hariprasad said...

@Santhosh,

My response is posted at http://bit.ly/KJcRwu.

Btw, I loved your Captcha testing blog.

Regards,
Parimala Hariprasad