Hooray robots.txt is useful for sitemap xml

hooray.gifHooray robots.txt is useful for its new sitemap.xml option! For the first time it tells the search engine what pages to spider and not just not to spider.

Ask.com official announcement http://blog.ask.com/2007/04/sitemaps_autodi.html

“Today, Ask.com, Google, Microsoft Live Search and Yahoo! together are announcing support of autodiscovery of Sitemaps. The new open-format autodiscovery allows webmasters to specify the location of their Sitemaps within their robots.txt file, eliminating the need to submit sitemaps to each search engine separately. Comprehensiveness and freshness are key initiatives for every search engine, and with autodiscovery of sitemaps, everyone wins:

  • Webmasters save time with the ability to universally submit their content to the search engines and benefit from reduced unnecessary traffic by the crawlers
  • The search engines get information with regards to pages to index as well as metadata with clues about which pages are newly updated and which pages are identified as the most important
  • Searchers benefit from improved search experience with better comprehensiveness and freshness

In addition, Ask.com is now supporting submission of Sitemaps via ttp://submissions.ask.com/ping?sitemap=SitemapUrl. Of course, neither autodiscovery nor manual submission guarantee pages will be added to the index. The pages must meet our quality criteria for inclusion in the index. And use of these submission methods does not influence ranking. I will be talking about today’s announcement (along with my counterparts at Google, Microsoft and Yahoo!) during the SiteMaps and Site Submission session at SES in New York later this morning. If you aren’t able to join us, more information is available at http://www.sitemaps.org/ and http://about.ask.com/en/docs/about/webmasters.shtml#22. We are excited about our participation with the Sitemaps via robots.txt protocol and look forward to our collaboration with Google, Microsoft, Yahoo! and others in furthering important initiatives that make search easier for webmasters and more powerful for users” Vivek Pathak, Infrastructure Product Manager. Ask.com

What to do?

Just add one simple line to your robots.txt file that will tell Google, yahoo, Ask and MSN search engines, where your file is. No need to create an account. Simply upload your XML sitemap and add a line including the full path to your robots.txt file:

Sitemap: http://www.yoursite.com/your_sitemap.xml

Shall I do?

Sad enough but robots file gives free gift to spammers to misuse this protocol. Take a look on this file:

http://www.bluenile.com/robots.txt

User-agent: *
Disallow: /emails/
Disallow: /promos/
Disallow: /wwwcore/
Disallow: design.asp
Disallow: pendant_design.asp
Disallow: earring_design.asp

“This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operate on certain data.” source: http://en.wikipedia.org/wiki/Robots.txt
If I was a spammer I was kissing robots.tx file giving me their emails folder name!

Read more about Misuse of Robots.txt abounds here: http://searchengineland.com/070416-131549.php

The full protocol can be found here: http://www.sitemaps.org/protocol.html

 

Related Articles