6 Reasons Why Clean URLs Matter

Share This

I’ve been reading SEO forums lately and have read some comments that there is absolutely no difference between using static or dynamic URLs.

A site can get indexed and ranked well if it has dynamic URLs, but that does not mean that dynamic URLs are as good as clean URLs.

Note: if you have dynamic URLs, don’t just change them to clean URLs after reading this without knowing what you are doing. I’ll cover the risks involved with doing that in another post. Changing URL structure from dynamic to static would not be recommended in all cases. This post just contains general advice on why you should start with clean URLs.

1. The search engines recommend clean URLs

MSN says it very plainly, “Keep your URLs simple and static.

Google — “If you decide to use dynamic pages (i.e., the URL contains a “?” character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.

Update: (26 Nov 07) Google has updated their Webmaster Guidelines to specifically talk about problems with dynamic URLs:

Google indexes dynamically generated webpages, including .asp pages, .php pages, and pages with question marks in their URLs. However, these pages can cause problems for our crawler and may be ignored. If you’re concerned that your dynamically generated pages are being ignored, you may want to consider creating static copies of these pages for our crawler. If you do this, please be sure to include a robots.txt file that disallows the dynamic pages in order to ensure that these pages aren’t seen as having duplicate content.

Yahoo’s answer is more complex because they have a dynamic URL tool in SiteExplorer. Yahoo clearly states some of the problems that dynamic URLs can create for spiders:

You might use URL parameters in your site to perform various functions apart from modifying the content of the page, such as,

  • session ids - for tracking user sessions
  • source trackers – for tracking the sources which are sending referrals to your pages and site
  • format modifiers – for print formats etc

In these cases, our crawler will come across different versions of your site’s URLs that are substantially, if not exactly, similar in content. This causes numerous problems:

  1. Such URLs look like different documents to our crawler and create excessive crawling on your site.
  2. We do our best to detect duplicates among these pages, and the detected duplicates are prevented from ranking well. However, if we are unable to detect the duplicate, this results in duplicate results from your site competing for positions in search results
  3. When people link to your site with different versions of these URLs, it fragments the link referrals to your site across multiple different URLs even though it’s the same page.

Ask.com doesn’t have much information on dynamic URLs, except for a quick mention on their UK site:

We include a select number of dynamic URLs in our index. However, they are screened to detect likely duplicates before downloading.

Note that Ask.com carefully uses the phrase, “a select number”.

Search engines are not saying that they cannot spider and index dynamic URLs, but they are giving very clear hints that it is easier for them to index static URLs.

2. Dynamic URLs probably do not pass link value in the same way

Matt Cutts talked a little bit about this in one of his videos. Here is a transcription, with relevant parts highlighted:

Does Google treat Dynamic Pages differently than static pages…?

Good question. To a first approximation, we do treat static and dynamic pages in similar ways in ranking. So let me explain that in a little more detail.

Pagerank flows to dynamic URL’s in the same way they flow to static URL’s. And so, if you’ve got New York Times linking to a dynamic URL you’ll still get the Pagerank benefit, and it will still flow the Pagerank Benefit. There are other Search Engines who in the past have said, ‘OK, we’ll go one-level deep from static URLs, so we’re not going to crawl from a dynamic URL, but we’re willing to go into the dynamic URL space from a static URL’. So, the short answer is Pagerank still flows just the same between static and dynamic.

Let’s go into the more detailed answer.

The example you gave actually has 5 parameters, and one of them is like a Product ID with like 2725. You definitely can use too many parameters. I would absolutely opt for 2 or 3 at the most, if you have any choice whatsoever. And try to avoid long numbers, because we can think that those are session IDs. Any extra parameters that you can get rid of are always a good idea.

And remember that Google is not the only Search Engine out there, so if you have the ability to basically say I’m going to use a little bit of mod_rewrite and I’m going to make it look like a static URL, that can often be a very good way to tackle the problem. So, PageRank still flows but, experiment! If you don’t see any URLs that have the same structure, or the same number of parameters as you’re thinking about doing; it’s probably better if you can either cut back on the number of parameters or shorten them somehow or try to use mod_rewrite.

So Matt Cutts is saying that, “Pagerank flows to dynamic URL’s in the same way they flow to static URL’s.” Does that mean that PageRank does not flow from dynamic pages the same way as it does from static URLs? For example, if a dynamic URL links to another dynamic URL, is it the same as if a static URL links to a page?

Yahoo used to link to a good powerpoint presentation from this page called Search Friendly Design (hopefully they will put it back or update it).

Here is a screenshot from the presentation:

Yahoo recommends clean URLs

Database-Driven Sites

  • What gets crawled
    • Static URLs
    • Dynamic Pages with in-links from static pages
    • Links between dynamic pages are problematic for crawlers (some get crawled, some don’t)
  • Limit URL depth when using dynamic-to-static

The presentation was from 2004, and it’s possible that Yahoo has upgraded their indexer to the point where dynamic URLs don’t matter at all anymore, but I doubt that Yahoo handles dynamic URLs in the exact same way as static URLs.

Here is another slide from the presentation where Yahoo (Tim Mayer) says that Yahoo “…won’t crawl… spider ‘traps’ (dynamic content)”.

Yahoo Spider and Indexer Slide

Not all dynamic content is a “spider trap”, but the concept of spider trap should be kept in mind when looking at dynamic URLs. Consider the perspective of a search engine architect: dynamic Web sites can generate an unlimited number of pages. The spider has to be able to automatically detect when it’s getting caught in an endless loop of dynamic content (”spider trap”). This is probably why Google strongly recommends not to exceed a a couple of URL parameters.

The search engines representatives are clearly hinting that you should watch out for dynamic URLs.

3. Dynamic URLs often work even when the parameters are reversed, added, or removed

For example, the URL http://example.com/index.php?name=abc&num=123 might also load at http://example.com/index.php?num=123&name=abc. If search engines find both links you may end up with multiple indexed URLs for the same page of content.

Here is an example of dynamic URLs causing problems on YouTube—a site that would benefit from some SEO if Google weren’t automatically inserting the site in the SERPs with Universal Search:

Both URLs load the same page of content and both are indexed by Google.

4. Dynamic URLs are harder to block with robots.txt

It is more difficult to block pages’ dynamic URLs with robots.txt files. The robots.txt standard (which needs updating) doesn’t yet officially support wildcards, though Google, Yahoo, and MSN all do. [UPDATE: MSN doesn’t fully support wildcards in robots.txt] For example, Google could block the duplicate URLs above on YouTube with the following robots.txt rules, but it’s an extension of the standard and may not work with all robots:


Disallow: /*mode=
Disallow: /*search=

If you only use static URLs for your site, you also have the advantage of being able to easily block weird query strings that might appear on your site.

For example, if all of your main URLs are clean you can block weird URLs that you might not have anticipated like http://example.com/widgets/5/?ajax_callback=true.

Update: 27 Nov 2007 Please disregard the following advice for the moment and see this post about Google and robots.txt before implementing it:

A single robots.txt rule should at least block those kinds of unexpected URLs from at least Google and Yahoo:


# this blocks all dynamic URLs from Google and Yahoo
Disallow: /*?

Warning: Be very careful before implementing that last robots.txt rule though because there may be essential pages on the site that have dynamic URLs. There are many factors to consider before doing that, such as “are there linked-to, indexed dynamic URLs on the site that are 301 redirecting to new URLs?” Basically, it should only be done on new sites where you are sure that you have constructed the site so that your essential site structure does not contain any dynamic URLs.

5. Clean URLs are more memorable

It’s better to send people the URL http://example.com/pagename than http://example.com/index.php?page=pagename.

If you have an ecommerce site, it’s easier for your customers to remember http://example.com/widgets than http://example.com/index.php?category_id=widgets.

6. Tim Berners-Lee says so (basically)

Tim Berners-Lee, the inventor of the WWW, has a great page of information about URL structure. He doesn’t specifically mention dynamic URLs, but reading the article will give a sense of how to create good URLs. File name extensions and software mechanisms should not appear in URLs.

Conclusion: why static URLs are better than dynamic

I’ve heard even some well-known SEOs say things like “there is no difference between static and dynamic URLs—just look at the dynamic URLs on my well-indexed site as an example”.

I’m not saying that dynamic URLs won’t get indexed or won’t rank highly. Search engines can spider and index dynamic URLs. But anyone who says, “I already rank highly for my main keywords; I don’t need SEO” does not understand the potential of SEO. Traffic can almost always be increased on a Web site, even on sites that rank #1 for highly competitive keywords.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*
Close
E-mail It