How does indexing and ordering work in the IU search service?
The Google search appliance implemented on all Indiana University campuses automatically finds web sites by crawling through official IU web sites. Only those sites and pages linked through an official IU site will be indexed. This search appliance will not affect the way other Internet search engines access your site.
For a variety of reasons, the search engine may not find every page in every domain. If you find that your site does not appear in the search results, report the missing site to Enterprise Web Tech Services.
On this page:
Indexing your site
The search engine will automatically index any sites that are linked to an official IU site. Google begins indexing at the top of the IU web site, and follows and indexes all links from there. For your pages to be indexed by Google, you simply need to:
- Post the pages in a web space that is not excluded from the IU
search collection.
- Make sure your pages don't contain meta tags that prevent the
robot from indexing.
- Make sure your page can be reached by clicking links from one of the top-level pages in IU's web site.
If your site is within these guidelines, there is no need to submit the site to the index; the Google crawler will pick up changed, new, and removed pages automatically.
If you don't believe that your site will be indexed within these parameters, follow the instructions below:
- If your site is affiliated with a department, lab, or office at
IU, request that entity to add a link to your site.
- To request a reindexing of your site, email Enterprise Web Tech Services with the URL of your site. Your site will then be placed into the queue.
Preventing searches on your site
If you wish to exclude an entire web server from the search engine,
insert a robots.txt file at the top level of your
site. The robots.txt file should contain something like
the following:
Note: If you want to simply restrict the Google
search appliance from crawling or indexing your site, the appliance
user-agent is known as iu-crawler . Simply replace
the asterisk above with iu-crawler .
If you wish to restrict the search engine from indexing a particular directory or page, insert the following <meta> tag into your page's header tag:
<head> <meta name="robots" content="noindex,nofollow"> </head>Note: If the page has been indexed in the past, and you add the <meta> tag, it will be removed from the index the next time Google crawls the page.
For more information on robots in general, see What are web search robots, and how do they affect me?
Site ranking
With Google's KeyMatch feature, you can improve the ranking of your pages so that your documents will appear at the top of the search results. KeyMatch allows you to select a list of keywords associated with a particular page. When a user types in those keywords, the document will appear at the top of the results page.
Last modified on September 22, 2009.







