Is Google is Broken? (Robots.txt Hell)

Share This

It looks like Google is not correctly obeying robots.txt again. Google states:

To block Googlebot from block crawling any URL that includes a ? (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):

User-agent: Googlebot

Disallow: /*?

However, I’ve recently seen a couple of sites that use this rule that got the root non-indexed. Removing the rule would get it reindexed quickly.

For example, the robots.txt rule Disallow: /section/*? would end up de-indexing the URL http://example.com/section/. Soon after removing that robots.txt rule, the page would be re-indexed.

Coincidence on two different sites?

Has anyone else seen this?

5 Comments

  1. Posted November 27, 2007 at 2:43 am | Permalink

    From here Disallow: /*? works as described. There are other reasons why a home page doesn’t make it in the index. Did you check the logs for a fetch by Googlebot? Did you test that Google indexes the root without this directive? Did you try
    Disallow: /*?
    Allow: /

  2. Posted November 27, 2007 at 2:47 am | Permalink

    Oops:
    Disallow: /*?
    Allow: /$

  3. Posted November 27, 2007 at 3:05 am | Permalink

    Good ideas. I will grep the logs if I see it again, though I’ve noticed other Google robots.txt problems recently where they were indexing pages that were always blocked by robots.txt.

    Google quickly indexed the root once that /*? rule had been removed.

  4. John Opera Windows
    Posted November 27, 2007 at 3:06 am | Permalink

    Post your URL in the Google Webmaster Help groups and I’m sure people there will figure something out (or someone from Google can take a better look). It’s not really possible to do much without the URL. Thanks.

  5. Posted November 27, 2007 at 3:20 am | Permalink

    Unfortunately I cannot post the URLs in Google’s Webmaster forum.

    It has already been solved by the removal of that rule. If I have time tomorrow I will try to run a test on another domain and see if I can reproduce it for the Webmaster forum.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*
Close
E-mail It