Tuesday, July 10, 2007

Protect Against Invaders by SPAM-Proofing Your Website - Blocking Malicious "Good for Nothing" Robots

The robots that you will want to block will depend on your preferences, as well as any bots that frequent your website on a regular basis.  Cutting down on bandwidth costs, preventing robots from collecting your email address, and preventing robots from collecting information from you or your website are all good reasons to block a potential robot.

The best method of deciding which robots to block is to do some quick research about the robots that like to take residence on your site.  If you cannot find reliable information about a robot or its use of something you would not approve of, simply block the robot by using a robots.txt file.  If you find that a robot does not obey the robots.txt file, pull out the big guns and use mod_rewrite to stop them dead in their tracks.

Example Robots

There are several common bots that one might run into frequently such as "Microsoft URL Control" which is a robot that ignores the robots.txt file and fetches as many pages as it can before leaving the site.  This SPAMbot is used by many different people all using the same name. 

 The second robot that frequents websites is the NameProtect (NPbot) robot. This robot's job is to collect information about websites that are potentially violating brand names of clients.  This robot does not obey the robots.txt file, responds to emails sent to the NameProtect company, and serves no good purpose as far as we have determined.

To Block the Microsoft URL Control Robot by User Agent:

RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control"
RewriteRule .* - [F,L]

To Block the Nameprotect Robot by User Agent:

RewriteCond %{HTTP_USER_AGENT} "NPbot"
RewriteRule .* - [F,L]

Furthermore, once you establish a good number of bots that you would like to block using mod_rewrite, you can compile a list and add comments as well, like so:

RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control" [OR] #bad bot
RewriteCond %{HTTP_USER_AGENT} "NPbot"
RewriteRule .* - [F,L]

One thing to note about using the examples here, make sure that you correctly know how to insert the script into mod_rewrite and that you do so in the proper rules required for this technique to be effective.  Additionally, one last thing to note is that mod_rewrite rules are not an ultimate solution to SPAM and malicious bot problems. You can, however, effectively block a good majority of bots out there and dramatically cut down on the amount of SPAM you receive. If you use the JavaScript methods and mod_rewrite then, not only will your website be one heavily guarded anti-SPAM site, but you may actually enjoy downloading your all email messages to find them SPAM free.

No comments: