Graywolf has an interesting post about using sitemaps for competitive intelligence:
…people who are using automated solutions like popularity contest are telling you their most highly trafficked pages. Other people who are generating Sitemap XML files in a more manual fashion, are telling you the pages they want to rank. Chances are good the pages they want to rank for are the “money pagesâ€.
It’s an interesting idea. If you are worried about someone doing this to your sites, you could try to hide your sitemaps by giving them unpredictable names and only using the ping technique to tell search engines where the sitemap index is located.
If you want to find your competitors’ sitemaps you could use search engine queries like this:
- http://www.google.com/search?q=site%3Agoogle.com+filetype%3Axml
- http://www.google.com/search?q=site%3Agoogle.com+inurl%3Asitemap
If you want to make sure that search engines can’t index your sitemap files you might be able to block them with an x-robots-tag HTTP header telling Google and Yahoo not to index them.
EDIT: I removed the idea of blocking the sitemap with a robots.txt noindex directive, because it probably won’t work in this situation.
Related posts:
- XML Sitemaps Do Not Affect Your Google Rankings Should you waste your time creating XML sitemaps?...
- Henk Van Ess and Other Interesting Resources Lesser-known essential reading for SEOs....
- Google Does Not Obey Robots.txt Why isn't Google obeying robots.txt files?...
- 6 Reasons Why Clean URLs Matter 6 solid reasons that you should be using clean URLs....
- Introduction to Advanced SEO Site Analysis for Large Web Sites A few tips on how to start an SEO analysis...


2 Comments
Noindex: /sitemap.xml$
would tell Googlebot NOT to fetch it. That’s experimental syntax and it doesn’t work as expected. Better make use of the X-Robots-Tag.
Good point — I edited the post above.