Google Does Not Obey Robots.txt

Share This

Google is not obeying robots.txt files.

I’ve seen this on several sites already. Sections are blocked from spiders with robots.txt. Google then spiders those pages and indexes them with a cached version of the page. Google even displays the last-crawled date on the cached pages.

I’m sure that the robots.txt rules are correct.

Google should not be accessing pages that are blocked with robots.txt.

If you have seen this problem, please leave a comment below.

3 Comments

  1. Posted September 20, 2007 at 11:50 am | Permalink

    What I have seen is something slightly different.

    Feedburner provide a way to block search engines from crawling your feed. It is code in the RSS itself.
    Within my Feedburner stats it is still saying that Google is visiting (no not just the bot picking up feeds for subscribers)

  2. Posted September 20, 2007 at 12:03 pm | Permalink

    Hi Andy, thanks for your comment.

    Feedburner is saying that Googlebot is spidering your feed? Have you seen it indexed? (e.g., cache:example.com/page/feed/)

    On at least one of the sites that I’ve noticed this problem on, Google’s robots.txt tool says that the pages are blocked. But Google still spiders and indexes them…

  3. Posted September 20, 2007 at 3:35 pm | Permalink

    Hmm, it seems it has stopped being indexed now, that is interesting. It might actually be working as intended.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*
Close
E-mail It