Robots.txt – Watching the Minor Details

I noticed that Google was indexing one of my blog’s feeds that I thought I had blocked with robots.txt:

Google indexing RSS feeds

The indexed URL is pocketseo.com/domains/7/feed. Notice that it doesn’t have a trailing slash.

My robots.txt rule is:

Disallow: /*/feed/

WordPress URLs often have trailing slashes. I don’t want to remove that trailing slash from robots.txt otherwise it would block any URL that contains the word “feed”.

It’s not a major problem, but here is a quick solution for that one indexed URL:

Disallow: /*/feed/
Disallow: /domains/7/feed

It’s not important on a site like pocketseo.com, but I think that robots.txt rules are important on large sites and they can sometimes be tough to get right.

Share and Enjoy:
  • Twitter
  • Sphinn
  • Facebook
  • del.icio.us
  • Digg
  • Reddit
  • StumbleUpon
  • Google Bookmarks
  • Mixx
  • Tumblr
  • FriendFeed
  • LinkedIn

Related posts:

  1. Is Google is Broken? (Robots.txt Hell) Something is wrong with Google and robots.txt....
  2. Serious Bug in MSN SERPs This screenshot shows a bug in the MSN SERPs....
  3. MSN Live Search Only Has Partial Support for Wildcards in Robots.txt Problems with MSN Live Search's robots.txt docs....
  4. How to Spot the Ultimate Robots.txt Mistake Don't block your entire Web site with your robots.txt file....
  5. Google Does Not Obey Robots.txt Why isn't Google obeying robots.txt files?...

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*