What are web search robots, and how do they affect me?

Search robots, also known as bots, wanderers, spiders, and crawlers, are the tools many web search engines, such as Google, Bing, and Yahoo!, use to build their databases. Most robots work like web browsers, except they don't require user interaction.

Robots access web pages, often using links to locate and link to other sites. They can index titles, summaries, or the entire contents of documents much more quickly and thoroughly than a human could.

While their speed and efficiency make them very appealing to the managers of search engines, search robots, especially poorly constructed ones, can overwhelm some servers. Administrators can exclude or limit robot access by placing robots.txt files on their servers that outline how their sites are to be accessed.

At Indiana University, the individual web page server, pages.iu.edu, has a robots.txt file that denies access to all robots. Faculty and staff may change their robots.txt search engine crawl settings at Search Engine Crawl Settings for Pages. For more about the Pages service, as well as alternative ways to publicize your Pages web page, see At IU, what is Pages, and how can I publish a web page there?

For more about excluding robot searches, see About /robots.txt and A Standard for Robot Exclusion.

If you have your own pages on a system that is not protected with a robots.txt file and you wish to exclude robot searches, you can add the following tag to the headers of your pages:

  <meta name="robots" content="noindex,nofollow">

For more about how to use the <meta> tag with the robot attribute to regulate robot searches of your pages, see About the Robots <META> tag.

Unfortunately, not all robots honor robot exclusions and limitations.

In addition to regulating whether robots search your page, you can also use the <meta> tag with the keyword and description attributes to improve the results that robots get. Search engines use descriptions to describe your page, which can be especially useful if your page contains little text. Search engines index keywords in addition to text in the title and body of your document. For example, a web page about Darth Vader might include these <meta> tags:

  <meta name="keyword" content="evil leader, darkside, sith, choke,
  empire, asthmatic"> 

  <meta name="description" content="Darth Vader: More than just another 
  pretty face">

For more about these tags and <meta> tags in general, see HTTP-EQUIV (HTTP header) Index. For more about robots, including a FAQ and a list of known robots, see Frequently Asked Questions.

This is document aeub in the Knowledge Base.
Last modified on 2015-03-12 00:00:00.

  • Fill out this form to submit your issue to the UITS Support Center.
  • Please note that you must be affiliated with Indiana University to receive support.
  • All fields are required.

Please provide your IU email address. If you currently have a problem receiving email at your IU account, enter an alternate email address.

  • Fill out this form to submit your comment to the IU Knowledge Base.
  • If you are affiliated with Indiana University and need help with a computing problem, please use the I need help with a computing problem section above, or contact your campus Support Center.

Please provide your IU email address. If you currently have a problem receiving email at your IU account, enter an alternate email address.