Pages

Advertisement

Friday, September 21, 2007

Provocative Search Engine Friendly URLs in ASP.NET

 

Data displayed by dynamic Web sites is usually stored in some sort of backend database. Typically, a numeric ID is associated with a data row of a database table, and all database operations with the table (such as selecting, inserting, deleting, or updating rows) are done by referencing that ID.

More often than not, the same ID used to identify an item in the database is also used in ASP.NET code to refer to that particular item-such as a product in an e-commerce Web site, or an article of a blog, and so on. In a dynamic URL, these IDs are passed via the query string to a script that presents differing content accordingly.

Figure 1 shows a page from http://www.cristiandarie.ro/BalloonShop/. This is a demo e-commerce site presented in one of Cristian's books, and employs dynamic URLs. As you can see, the page is composed using data from the database, and the ID that identifies the data item is taken from the dynamic URL.

Figure 1

This is probably the most common approach employed by dynamic Web sites at present, as you frequently meet URLs such as the following:

  • http://www.example.com/Catalog.aspx?CatID=1
  • http://www.example.com/Catalog.aspx?CatID=2&ProdID=3&RefID=4

This approach is certainly the easiest and most straightforward when developing a dynamic site. However, this is about the only benefit these URLs bring. Dynamic URLs come with three important potential drawbacks, however:

  • They are frequently sub-optimal from a search engine spider's point of view.
  • They don't provide relevant keywords or a call to action to a human viewing the URL, therefore reducing the CTR.
  • They aren't easy to remember, or communicate to other parties in the offline world.

Some programmers also tend to use extra parameters freely. For example, if the parameter RefID from the previous example is used for some sort of tracking mechanism, and search engine friendliness is a priority, it should be removed. Lastly, any necessary duplicate content should be excluded from search engines' view using a robots.txt file or a robots meta tag.

TIP

If URLs on your site are for the most part indexed properly, it may not be wise to restructure URLs. However, if you decide that you must, please also read Chapter 4 "Content Relocation and HTTP Status Codes" in the book Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO (Wrox, 2007, ISBN: 978-0-470-13147-3), which teaches you how to make the transition smoother. It shows how to preserve link equity by properly redirecting old URLs to new URLs. Also, not all solutions to URL-based problems require restructuring URLs; as mentioned earlier, duplicate content can be excluded using the robots.txt file or the robots meta tag.

Numeric Rewritten URLs

An improved version of the previous example is a modified URL that removes the dynamic parameters and hides them in a static URL. This static URL is then mapped, using one of the techniques you'll learn later, to a dynamic URL. The RefID parameter previously alluded to is also not present, because those types of tracking parameters usually can and should be avoided.

  • http://www.example.com/Products/1/
  • http://www.example.com/Products/2/1/

The impact of numeric URL rewriting will likely be negligible with the search engines on pages with a single parameter, but it may be significant on pages with two parameters or more. For humans, using numeric URLs can be beneficial when the context makes the URLs hackable, giving those numbers a special meaning. The best example is with blogs, which are frequently employing numeric URLs to reflect the date of the content; for example, http://blog.example.com/2007/07/17/ will contain the post or posts from July 17th, 2007.

This form of URL is particularly well-suited to the adaptation of existing software. Retrofitting an application for keyword-rich URLs, as covered in the next section, may present additional difficulty in implementation.

Keyword-Rich Rewritten URLs

Finally, here are two ideal keyword-rich URLs:

  • http://www.example.com/High-Powered-Drill-P1.html
  • http://www.example.com/Tools-C2/High-Powered-Drill-P1.html

This is the best approach to creating URLs, but also presents an increased level of difficulty in implementation-especially if you are modifying preexisting source code for software. In that case this solution may not have an easy and apparent implementation, and it requires more interaction with the database to extract the copy for the URLs.

NOTE

The decision whether to use the .html suffix in the URL is mostly a non-issue. You could also use a URL such as http://www.example.com/High-Powered-Drill-P1/, if you prefer the look of directories.

This "ideal" URL presents a static URL that indicates both to the search engine and to the user that it is topically related to the search query. Usually the keyword-rich URLs are created using keywords from the name or description of the item presented in the page itself. Characters in the keyword string that are not alphanumeric need to be removed, and spaces should be converted to a delimiting character. Dashes are desirable over underscores as the delimiting character because most search engines treat the dash as a space, and the underscore as an actual character, though this particular detail is probably not terribly significant. On a new site, dashes should be chosen as a word-delimiter.

Implementing URL Rewriting

From this moment on, this article discusses URL rewriting. Of particular importance is Scott Guthrie's ASP.NET URL rewriting article. Another interesting article is "URL Rewriting in ASP.NET", by Scott Mitchell.

The hurdle we must overcome to support keyword-rich URLs like those shown earlier is that they don't actually exist anywhere in your Web site. Your site still contains a script-named, say, Product.aspx-which expects to receive parameters through the query string and generate content depending on those parameters. This script would be ready to handle a request such as this:

http://www.example.com/Product.aspx?ProductID=123

but your Web server would normally generate a 404 error if you tried any of the following:

http://www.example.com/Products/123.html

http://www.example.com/my-super-product.html

URL rewriting allows you to transform the URL of such an incoming request (which we'll call the original URL) to a different, existing URL (which we'll call the rewritten URL), according to a defined set of rules. You could use URL rewriting to transform the previous nonexistent URLs to Product.aspx?ProductID=123, which does exist.

If you happen to have some experience with the Apache Web server, you probably know that it ships by default with the mod_rewrite module, which is the standard way to implement URL rewriting in the LAMP (Linux/Apache/MySQL/PHP) world. That is covered in our book Professional Search Engine Optimization with PHP: A Developer's Guide to SEO (Wrox, 2007, ISBN: 978-0-470-10092-9).

Unfortunately, IIS doesn't ship by default with such a module. IIS 7 contains a number of new features that make URL rewriting easier, but it will take a while until all existing IIS 5 and 6 Web servers will be upgraded. Third-party URL-rewriting modules for IIS 5 and 6 do exist, and also several URL-rewriting libraries, hacks, and techniques, and each of them can (or cannot) be used depending on your version and configuration of IIS, and the version of ASP.NET. In this article we try to cover the most relevant scenarios by providing practical solutions.

To understand why an apparently easy problem-that of implementing URL rewriting-can become so problematic, you first need to understand how the process really works. To implement URL rewriting, there are three steps:


  1. Intercept the incoming request. When implementing URL rewriting, it's obvious that you need to intercept the incoming request, which usually points to a resource that doesn't exist on your server physically. This task is not trivial when your Web site is hosted on IIS 6 and older. There are different ways to implement URL rewriting depending on the version of IIS you use (IIS 7 brings some additional features over IIS 5/6), and depending on whether you implement rewriting using an IIS extension, or from within your ASP.NET application (using C# or VB.NET code). In this latter case, usually IIS still needs to be configured to pass the requests we need to rewrite to the ASP.NET engine, which doesn't usually happen by default.
  2. Associate the incoming URL with an existing URL on your server. There are various techniques you can use to calculate what URL should be loaded, depending on the incoming URL. The "real" URL usually is a dynamic URL.
  3. Rewrite the original URL to the rewritten URL. Depending on the technique used to capture the original URL and the form of the original URL, you have various options to specify the real URL your application should execute.

The result of this process is that the user requests a URL, but a different URL actually serves the request. The rest of the article covers one way to implement each of the preceding steps:


  • URL rewriting with IIS and ISAPI_Rewrite

The book Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO also covers these three additional methods.


  • URL rewriting using URLRewriter
  • Writing a custom URL rewriting handler
  • URL rewriting with IIS 7

For background information on how IIS processes incoming requests, we recommend Scott Mitchell's article How ASP.NET Web Pages are Processed on the Web Server.

URL Rewriting with IIS and ISAPI_Rewrite


If your IIS Web server, no matter if it's IIS 5, 6, or 7, has an ISAPI rewriting filter installed, we encourage you to use it, because it's likely to be the most efficient and practical method to implement URL rewriting. When such a filter is used, rewriting happens right when the request hits your Web server, before being processed by the ASP.NET ISAPI extension. This has the following advantages:


  • Simple implementation. Rewriting rules are written in configuration files; you don't need to write any supporting code.
  • Task separation. The ASP.NET application works just as if it was working with dynamic URLs. Apart from the link building functionality, the ASP.NET application doesn't need to be aware of the URL rewriting layer of your application.
  • You can easily rewrite requests for resources that are not processed by ASP.NET by default, such as those for image files, for example.

To process incoming requests, IIS works with ISAPI extensions, which are code libraries that process the incoming requests. IIS chooses the appropriate ISAPI extension to process a certain request depending on the extension of the requested file. For example, an ASP.NET-enabled IIS machine will redirect ASP.NET-specific requests (which are those for .aspx files, .ashx files, and so on), to the ASP.NET ISAPI extension, which is a file named aspnet_isapi.dll.

To intercept incoming requests on an IIS 6-based server, you can create your own URL rewriting ISAPI filter or use an existing one. Creating your own ISAPI filter is too complex a process to cover in this book, but fortunately existing products are available:


Figure 2 describes how an ISAPI Rewrite filter, such as those just listed, fits into the picture. Its role is to rewrite the URL of the incoming requests, but doesn't affect the output of the ASP.NET script in any way.

TIP

At first sight, the rewriting rules can be added easily to an existing Web site, but in practice there are other issues to take into consideration. For example, you'd also need to modify the existing links within the Web site content. In Chapter 4, "Content Relocation and HTTP Status Codes", in the book, Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO you continue by learning how to properly redirect old links to the new links in a preexisting Web site, to preserve their equity.


Figure 2

ISAPI rewriting filters can be invaluable tools to Web developers tasked with architecting complex dynamic sites that are still search engine friendly. They allow the programmer to easily declare a set of rules that are applied by IIS on-the-fly to map incoming URLs requested by the visitor to dynamic query strings sent to various ASP.NET pages. As far as a search engine spider is concerned, the URLs are static.

The following few pages demonstrate URL rewriting functionality by using Helicon's ISAPI_Rewrite filter. You can find its official documentation at http://www.isapirewrite.com/docs/. Ionic's ISAPI rewriting module has similar functionality.

In the first exercise, we'll create a simple rewrite rule that translates my-super-product.html (available as part of the code download for the book Professional Search Engine Optimization with ASP.NET: A Developer's Guide to SEO,) to Product.aspx?ProductID=123. This is the exact scenario that was presented in Figure 2.

The Product.aspx Web Form is designed to simulate a real product page. The script receives a query string parameter named ProductID, and generates a very simple output message based on the value of this parameter. Figure 3 shows the sample output that you'll get by loading http://seoasp/Product.aspx?ProductID=3.


Figure 3

In order to improve search engine friendliness, we want to be able to access the same page through a static URL: http://seoasp/my-super-product.html. To implement this feature, we'll use-you guessed it!-URL rewriting, using Helicon's ISAPI_Rewrite.

As you know, what ISAPI_Rewrite basically does is to translate an input string (the URL typed by your visitor) to another string (a URL that can be processed by your ASP.NET code). In this exercise, we'll make it rewrite my-super-product.html to Product.aspx?ProductID=123.

TIP

This article covers ISAPI_Rewrite version 2. At the moment of writing, ISAPI_Rewrite 3.0 is in beta testing.

Using Helicon's ISAPI_Rewrite



  1. The first step is to install ISAPI_Rewrite. Navigate to http://www.helicontech.com/download.htm and download ISAPI_Rewrite Lite (freeware). The file name should be something like isapi_rwl_x86.msi. At the time of writing, the full (not freeware) version of the product comes in a different package if you're using Windows Vista and IIS 7, but the freeware edition is the same for all platforms.
  2. Execute the MSI file you just downloaded, and install the application using the default options all the way through.

TIP

If you run into trouble, you should visit the Installation section of the product's manual, at http://www.isapirewrite.com/docs/#install. If you run Windows Vista, you need certain IIS modules to be installed in order for ISAPI_Rewrite to function.


  1. Make sure your IIS Web server is running and create a http://seoasp/ Web site using Visual Web Developer.
  2. Create a new Web Form named Product.aspx in your project, with no code-behind file or Master Page. Then modify the generated code as shown in the following code snippet. (Remember that you can have Visual Web Developer generate the Page_Load signature for you by switching to Design view, and double-clicking an empty area of the page or using the Properties window.)


<%@ Page Language="C#" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 
<script runat="server">
   1:  
   2:   protected void Page_Load(object sender, EventArgs e)
   3:   {
   4:     // retrieve the product ID from the query string
   5:     string productId = Request.QueryString["ProductID"];
   6:  
   7:     // use productId to customize page contents
   8:     if (productId != null)
   9:     {
  10:       // set the page title
  11:       this.Title += ": Product " + productId;
  12:  
  13:       // display product details
  14:       message.Text =
  15:         String.Format("You selected product #{0}. Good choice!",
  16:                       productId);
  17:     }
  18:     else
  19:     {
  20:       // display product details
  21:       message.Text = "Please select a product from our catalog.";
  22:     }
  23:  
  24:   }
</script>


  1. Test your Web Form by loading http://seoasp/Product.aspx?ProductID=3. The result should resemble Figure 3.
  2. Let's now write the rewriting rule. Open the Program Files/Helicon/ISAPI_Rewrite/httpd.ini file (you can find a shortcut to this file in Programs), and add the following highlighted lines to the file. Note the file is read-only by default. If you use Notepad to edit it, you'll need to make it writable first.

[ISAPI_Rewrite]

# Translate /my-super.product.html to /Product.aspx?ProductID=123
RewriteRule ^/my-super-product\.html$ /Product.aspx?ProductID=123


  1. Switch back to your browser again, and this time load http://seoasp/my-super-product.html.

BACKLINK
,

Technorati Tags:

No comments:

Post a Comment