What are web search robots, and how do they affect me?
Search robots, also known as bots, wanderers, spiders, and crawlers, are the tools many web search engines, such as Google, Bing, and Yahoo! use to build their databases. Most robots work like web browsers, except they don't require user interaction.
Robots access web pages, often using links to locate and link to other sites. They can index titles, summaries, or the entire contents of documents much more quickly and thoroughly than a human could.
While their speed and efficiency make them very appealing to the
managers of search engines, search robots, especially poorly
constructed ones, can overwhelm some servers. Administrators can
exclude or limit robot access by placing robots.txt files
on their servers that outline how their sites are to be accessed.
At Indiana University, the personal web page server, mypage.iu.edu,
has a robots.txt file that denies access to all robots.
Faculty and staff may change their robots.txt search
engine crawl settings at Search Engine Crawl
Settings for Mypage. For more about the Mypage service, as well
as alternative ways to publicize your Mypage page, see At IU, what is Mypage, and how can I publish a web page there?
For more about excluding robot searches, see About /robots.txt and A Standard for Robot Exclusion.
If you have your own pages on a system that is not protected with a
robots.txt file and you wish to exclude robot searches,
you can add the following tag to the headers of your pages:
For more about how to use the <meta> tag with the robot attribute to regulate robot searches of your pages, see About the Robots <META> tag.
Unfortunately, not all robots honor robot exclusions and limitations.
In addition to regulating whether robots search your page, you can also use the <meta> tag with the keyword and description attributes to improve the results that robots get. Search engines use descriptions to describe your page, which can be especially useful if your page contains little text. Search engines index keywords in addition to text in the title and body of your document. For example, a web page about Darth Vader might include these <meta> tags:
<meta name="keyword" content="evil leader, darkside, sith, choke, empire, asthmatic"> <meta name="description" content="Darth Vader: More than just another pretty face">For more about these tags and <meta> tags in general, see HTTP-EQUIV (HTTP header) Index. For more about robots, including a FAQ and a list of known robots, see Frequently Asked Questions.
Last modified on February 19, 2013.







