Graywolf has an interesting post about using sitemaps for competitive intelligence:
…people who are using automated solutions like popularity contest are telling you their most highly trafficked pages. Other people who are generating Sitemap XML files in a more manual fashion, are telling you the pages they want to rank. Chances are good the pages they want to rank for are the “money pages”.
It’s an interesting idea. If you are worried about someone doing this to your sites, you could try to hide your sitemaps by giving them unpredictable names and only using the ping technique to tell search engines where the sitemap index is located.
If you want to find your competitors’ sitemaps you could use search engine queries like this:
- http://www.google.com/search?q=site%3Agoogle.com+filetype%3Axml
- http://www.google.com/search?q=site%3Agoogle.com+inurl%3Asitemap
If you want to make sure that search engines can’t index your sitemap files you might be able to block them with an x-robots-tag HTTP header telling Google and Yahoo not to index them.
EDIT: I removed the idea of blocking the sitemap with a robots.txt noindex directive, because it probably won’t work in this situation.

2 Comments
Noindex: /sitemap.xml$
would tell Googlebot NOT to fetch it. That’s experimental syntax and it doesn’t work as expected. Better make use of the X-Robots-Tag.
Good point — I edited the post above.