Pages

Advertisement

Tuesday, July 10, 2007

The Yahoo SLURP Crawler - Stonewalling

Another way of shutting out SLURP is by using the noindex meta-tag. Yahoo SLURP obeys this command in the document's head, and the code inserted in between the head tags of your document is

  <META NAME=”robots” CONTENT=”noindex”>

This snippet will ensure that that Yahoo SLURP does not index the document in the search engine database. Another useful command is the nofollow meta-tag. The code inserted is

  <META NAME=”robots” CONTENT=”nofollow”>

This snippet ensures that the links on the page are not followed.

Dynamic Page Indexing

This is the real charm of SLURP. Most search engine crawlers don’t bother crawling and indexing dynamic pages (.php, .asp, .jsp) since their content is subject to rapid change, which makes the process of indexing useless. Yahoo SLURP, however, does daily crawls in order to refresh the content on their indexed dynamic pages. It also does bi-weekly crawls which enables the search engine to discover new content and add it to its website incrementally. This enables a complex site's URLs, generated by forms and content management software, to be indexed.

This frequent crawls show up in your server logs as frequent download requests, as the crawler moves, stops, and restarts. Yahoo says that these frequent download requests should not be a cause for alarm.

SLURP's ability to index dynamic pages and to constantly refresh its content is a great relief to web designers (like me) who like having dynamic pages to enable fast loading and rapid updating. Websites which were not search engine friendly are suddenly in contention to be ranked number one.

However, the down side to this is that SLURP may never deliberately crawl your dynamic pages, unless you trigger the crawler via techniques which Yahoo encourages (to the benefit of their bottom line).

Getting Framed

Yahoo SLURP also has the ability to support frames, although it will not follow the SRC tag links to stand alone framesets; it only follows the HREF tags (as all good crawlers do).

No comments: