It looks like Google is not correctly obeying robots.txt again. Google states:
To block Googlebot from block crawling any URL that includes a ? (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):
User-agent: Googlebot
Disallow: /*?
However, I’ve recently seen a couple of sites that use this rule that got the root non-indexed. Removing the rule would get it reindexed quickly.
For example, the robots.txt rule Disallow: /section/*? would end up de-indexing the URL http://example.com/section/. Soon after removing that robots.txt rule, the page would be re-indexed.
Coincidence on two different sites?
Has anyone else seen this?
Related posts:
- Robots.txt – Watching the Minor Details Robots.txt tip - watch out for details....
- MSN Live Search Only Has Partial Support for Wildcards in Robots.txt Problems with MSN Live Search's robots.txt docs....
- How to Spot the Ultimate Robots.txt Mistake Don't block your entire Web site with your robots.txt file....
- Google Does Not Obey Robots.txt Why isn't Google obeying robots.txt files?...
- MSN Live Search Indexing URL Fragments MSN Live search is indexing URL fragments like this: http://example.com/page#section-1...


5 Comments
From here Disallow: /*? works as described. There are other reasons why a home page doesn’t make it in the index. Did you check the logs for a fetch by Googlebot? Did you test that Google indexes the root without this directive? Did you try
Disallow: /*?
Allow: /
Oops:
Disallow: /*?
Allow: /$
Good ideas. I will grep the logs if I see it again, though I’ve noticed other Google robots.txt problems recently where they were indexing pages that were always blocked by robots.txt.
Google quickly indexed the root once that /*? rule had been removed.
Post your URL in the Google Webmaster Help groups and I’m sure people there will figure something out (or someone from Google can take a better look). It’s not really possible to do much without the URL. Thanks.
Unfortunately I cannot post the URLs in Google’s Webmaster forum.
It has already been solved by the removal of that rule. If I have time tomorrow I will try to run a test on another domain and see if I can reproduce it for the Webmaster forum.