Often forgotten in the development rush, regularly updated XML sitemaps are essential in guaranteeing that search engines crawl all your content.
Here is Google on the subject:
Sitemaps are a way to tell Google about pages on your site we might not otherwise discover. In its simplest terms, a XML Sitemap—usually called Sitemap, with a capital S—is a list of the pages on your website. Creating and submitting a Sitemap helps make sure that Google knows about all the pages on your site, including URLs that may not be discoverable by Google’s normal crawling process…In addition, you can also use Sitemaps to provide Google with metadata about specific types of content on your site, including video, images, mobile, news, and software source code.
All good. And the major search engines like Google and Bing have agreed on a common format, so you only need to create one.
Creating a sitemap is easy. There are great plugins for most content management systems and some simple web-based creators. Here’s a list of Sitemap creation tools. If you want to see an example, here’s the sitemap for cjrogers.com. Or, if you want to test on your own site, XML Sitemaps will create one at no cost for sites with less that 500 URLs.
If you have a site with over 50,000 pages (and I certainly hope you do), break your sitemap into several mini-sitemaps by section and list the section sitemaps in your robots.txt file. Don’t have a robots.txt file? It’s just a text file that lists the sitemaps in this format:
sitemap: mysitemap1.xml
sitemap: mysitemap2.xml
It can also be used to tell seach engine crawlers to avoid or “disallow” some sections of your site. The robots.txt files are aren’t blocked and are visible on almost every site. Here’s CNN‘s and the New York Times.’