Indiana University
University Information Technology Services
  
What are archived documents?

What are web search robots, and how do they affect me?

Search robots, also known as wanderers, spiders, and crawlers, are the tools many web search engines, such as AltaVista, Excite, Lycos, and Go.com, use to build their databases. Most robots work like web browsers, except that they don't require user interaction.

Robots access web pages, often using links to locate and link to other sites. They can index titles, summaries, or the entire contents of documents much more quickly and thoroughly than a human could.

While their speed and efficiency make them very appealing to the managers of search engines, search robots, especially poorly constructed ones, can overwhelm some servers. Administrators can exclude or limit robot access by placing robots.txt files on their servers that outline how their sites are to be accessed. For example, at Indiana University, the personal web page server, mypage.iu.edu, has a robots.txt file that denies access to all robots. If you have a web page on mypage.iu.edu, it will not be indexed by robots, but you may submit it to a search engine such as Yahoo!.

Note: At Indiana University South Bend, the Mypage service is different from the Mypage service mentioned in this document. For more information, see IUSB's Web Publishing: Mypage page.

For more about excluding robot searches, see:

http://www.robotstxt.org/robotstxt.html http://www.robotstxt.org/orig.html

To see a real life example of a robots.txt file, see:

http://mypage.iu.edu/robots.txt

If you have your own pages on a system that is not protected with a robots.txt file, and you wish to exclude robot searches, you can add the following tag to the headers of your pages:

<meta name="robots" content="noindex,nofollow">

For more information about how to use the <meta> tag with the robot attribute to regulate robot searches of your pages, see:

http://www.robotstxt.org/meta.html

Unfortunately, not all robots honor robot exclusions and limitations.

In addition to regulating whether robots search your page, you can also use the <meta> tag with the keyword and description attributes to improve the results that robots get. Search engines use descriptions to describe your page, which can be especially useful if your page contains little text. Search engines index keywords in addition to text in the title and body of your document. For example, a web page about Darth Vader might include these <meta> tags:

<meta name="keyword" content="evil leader, darkside, sith, choke, empire, asthmatic"> <meta name="description" content="Darth Vader: More than just another pretty face">

For more information about these tags and <meta> tags in general, see:

http://vancouver-webpages.com/META/

For more information about robots, including a FAQ and a list of known robots, see:

http://www.robotstxt.org/faq.html

Also see:

This is document aeub in domain all.
Last modified on June 16, 2008.
Please tell us, did you find the answer to your question?