Tuesday, July 10, 2007

What Do Spiders See in a Hyperlink?

I’ll assume that you are all reasonably familiar with HTML. If you have ever looked at the source code for an HTML page, you probably noticed text like this wherever a hyperlink appeared:

When a web browser reads this, it knows that the text “SEO Chat” should be hyperlinked to the web page Incidentally, “SEO Chat” in this case is the “anchor text” of the link. When a spider reads this text, it thinks, “Okay, the page is relevant to the text on this page, and very relevant to the term `SEO Chat.’”

Let’s get a little more complicated.

Now what? The anchor text hasn’t changed, so the link will still look the same when the web browser displays it. But a spider will think, “Okay, not only is this page relevant to the term `SEO Chat,’ it is also relevant to the phrase `Great Site for SEO Info.’ And hey, there’s a relationship between the page I’m crawling now and this hyperlink! It says that this link doesn’t count as a ‘vote’ for the page being linked to. Okay, so it won’t add to the page rank.”

That last point, about the link not counting as a vote for the page being linked to, is what the rel="nofollow" tag does. This tag evolved to address the problem of people submitting linked comments to blogs that said things like "Visit my pharmaceuticals site!" That kind of comment is an attempt by the commenter to raise his own website's position in the search engine rankings. It's called comment spam, by the way; most major search engines don't like comment spam because it skews their results, making them less relevant. As you may have guessed, then, the “nofollow” tag in the “rel” attribute is specifically for search engines; it really isn't there to be noticed by anyone else. Yahoo!, MSN, and Google recognize it, but AskJeeves does not support nofollow; its crawler simply ignores the nofollow tag.

In some cases, a link may be assigned to an image. The hyperlink would then include the name of the image, and might include some alternate text in an “alt” attribute, which can be helpful for voice-based browsers used by the blind. It also helps spiders, because it gives them another clue for what the page is about.

Hyperlinks may take other forms on the web, but by and large those forms do not pass ranking or spidering value. In general, the closer a link is to the classic <a href=”URL”>text</a>, the easier it is for a spider to follow a link, and vice versa.

No comments: