Google and Yahoo both support wildcards (*) and end of string ($) characters in robots.txt files.
MSN’s Live Search is a little more confusing because they only have very limited support for wildcards in robots.txt files. Based on their docs, it looks like wildcards are supported. These are valid robots.txt rules for MSN’s Live.com:
As mentioned in an earlier post, MSN Live Search is sending referrer spam. If you don’t know about it yet, read my previous article on it. The referrer spam is those highly competitive, one-word keywords that Live.com is supposedly referring to your Web site.
The referrer strings look like this:
Notice that the keywords are generally one word and highly competitive and some of the keywords are porn-related.
I wish that A-list bloggers would pick this story up so that people would start to pay attention and get Microsoft to stop. (This blog doesn’t get enough traffic to spread the word by myself.) This referrer spam is distorting everyone’s logs, and making it look like Live.com gets more traffic than it really does.
I noticed that MSN was indexing URL fragments. A URL fragment is the part of a URL that comes after a pound sign like this:
Here is a screenshot.
Lately I’ve been noticing a lot of strange one-word referrers from MSN Live Search in my logs. Today I saw a post by BitWorm that reveals this as referrer spam-like behavior from Microsoft.
The hits come from IP addresses beginning with 65.55.165.* and the referrer strings have this format:
The keyword (highlighted above) is always one word, and usually appears on the Web site somewhere, with the exception of the porn related keywords.
MSNdude posted a comment on the Webmaster World thread about this issue:
While searching for the keyword blog in Google today I noticed that MSN’s Live.com was buying Adwords.
MSN Live Search is not displaying URLs correctly. In the screenshot below, it shows URLs from PocketSEO.com without a trailing slash. Yet the link above it goes to the correct URL (with the trailing slash).