Hiding email address and URLs from crawlers

Email addresses in plain text on web sites (whether links or not) are often harvested by crawlers to be used for spamming. They are often obfuscated by writing them in a form that a human reader can convert back to an email address (e.g. “user at domain dot com”), shown as images, or only shown the user enters a captcha. These methods are inconvenient for the user/reader (they can’t click them).

The same applies to URLs in contexts where those maintaining the web site do not want them to be visible to search engines (for example, to discourage spam in user-submitted content).

A Solution
A simple solution is to embed a client-side script (in the HTML page) that produces what the legitimate user should see (when executed by a web browser), without including the actual value as a single string in the script.

Crawlers generally won’t run the scripts, since the results of scripts would not usually be useful to them. If you know of any that do, please specify in a comments.

 

For example:

Email address:
<SCRIPT LANGUAGE="Javascript">
 document.write('<A HREF="mai' + 'lto' + String.fromCharCode(58) + 'user');
 document.write(String.fromCharCode(64) + 'doma' + 'in.com">user' + String.fromCharCode(64));
 document.write('doma' + 'in.com</A>');
</SCRIPT> <!-- to protect against crawlers -->
<NOSCRIPT>user at domain dot com</NOSCRIPT>

Leave a comment