I was just browsing through Google’s Webmaster Guidelines and saw that they have greatly improved them. Google actually provides examples and detailed information on things like complex URL structure, frames and Flash.
Topics on the How can I create a Google-friendly URL structure? page include:
- Additive filtering of a set of items
- Dynamic generation of documents.
- Problematic parameters in the URL.
- Sorting parameters.
- Irrelevant parameters in the URL, such as referral parameters.
- Calendar issues.
- Broken relative links.
That is invaluable information for Web developers. All Web developers should read that page. That page is something that you can refer clients to when they want assurance about your detailed robots.txt recommendations.
That page also provides a great argument for updating the robots.txt standard to include wildcards (*) and end of line characters ($). It is not possible to block complex dynamic URLs without wildcards and end of line characters. MSN Live, for example, does not fully support wildcards in robots.txt.
The page on title elements and alt attributes provides some good examples.
There is one error in the new guidelines when Google says to “make sure that your TITLE tags and ALT attributes are descriptive and accurate.” It is impossible to make “title tags” descriptive. Saying “optimize your title tags” is like saying “optimize your alt tags” — they are called “title elements”.
Google also finally, clearly says that dynamic pages are not the same as static pages:
Google indexes dynamically generated webpages, including .asp pages, .php pages, and pages with question marks in their URLs. However, these pages can cause problems for our crawler and may be ignored. If you’re concerned that your dynamically generated pages are being ignored, you may want to consider creating static copies of these pages for our crawler. If you do this, please be sure to include a robots.txt file that disallows the dynamic pages in order to ensure that these pages aren’t seen as having duplicate content.
The next step should be the ability for users to download the Google Webmaster Guidelines in PDF format so that they can be printed. The printed version could have a disclaimer that the pages were printed on a certain date and to check the current page on the Web for the latest information. I think that a printed version would be very useful for Web designers/developers because it would act as a reference manual when creating sites.