Google and Yahoo both support wildcards (*) and end of string ($) characters in robots.txt files.
MSN’s Live Search is a little more confusing because they only have very limited support for wildcards in robots.txt files. Based on their docs, it looks like wildcards are supported. These are valid robots.txt rules for MSN’s Live.com:
User-agent: msnbot Disallow: /*.PDF$ Disallow: /*.jpeg$ Disallow: /*.exe$ |
However, “MSNdude” recently stated on WebmasterWorld that “Live Search does not support wildcards in robots.txt today; we are thinking about it.”
An asterisk that substitutes for another set of characters is a wildcard, so this statement is confusing.
I think that wildcards should be added to the robots.txt standard. Wildcards in robots.txt files are essential for the ability to block certain kinds of dynamic URLs. The original robots.txt standard should be updated and MSN should fully jump on board.
Pingback: Google’s Vastly Improved Webmaster Guidelines - Pocket SEO
Well, aside from the fact that MSNdude hasn’t been the most credible source lately imo (ie. the reason he gives for the Live bot referrer spamming our sites), you also need to take into account that at times MSN completely ignores robots.txt. For instance, that same spamming bot is downloading AdSense, which as you can see is clearly blocked by Google:
http://pagead2.googlesyndication.com/robots.txt
Pingback: 6 Reasons Why Clean URLs Matter - Pocket SEO