Indiana University
University Information Technology Services
  
What are archived documents?
Login>>
Login

Login is for authorized groups (e.g., UITS, OVPIT, and TCC) that need access to specialized Knowledge Base documents. Otherwise, simply use the Knowledge Base without logging in.

Close

How does indexing and ordering work in the IU search service?

The Google search appliance implemented on all Indiana University campuses automatically finds web sites by crawling through official IU web sites. Only those sites and pages linked through an official IU site will be indexed. This search appliance will not affect the way other Internet search engines access your site.

For a variety of reasons, the search engine may not find every page in every domain. If you find that your site does not appear in the search results, report the missing site to Enterprise Web Tech Services.

On this page:


Indexing your site

The search engine will automatically index any sites that are linked to an official IU site. Google begins indexing at the top of the IU web site, and follows and indexes all links from there. For your pages to be indexed by Google, you simply need to:

  • Post the pages in a web space that is not excluded from the IU search collection.

  • Make sure your pages don't contain meta tags that prevent the robot from indexing.

  • Make sure your page can be reached by clicking links from one of the top-level pages in IU's web site.

If your site is within these guidelines, there is no need to submit the site to the index; the Google crawler will pick up changed, new, and removed pages automatically.

If you don't believe that your site will be indexed within these parameters, follow the instructions below:

  • If your site is affiliated with a department, lab, or office at IU, request that entity to add a link to your site.

  • To request a reindexing of your site, email Enterprise Web Tech Services with the URL of your site. Your site will then be placed into the queue.

Preventing searches on your site

If you wish to exclude an entire web server from the search engine, insert a robots.txt file at the top level of your site. The robots.txt file should contain something like the following:

User-agent: * Disallow: /

Note: If you want to simply restrict the Google search appliance from crawling or indexing your site, the appliance user-agent is known as iu-crawler . Simply replace the asterisk above with iu-crawler .

If you wish to restrict the search engine from indexing a particular directory or page, insert the following <meta> tag into your page's header tag:

<head> <meta name="robots" content="noindex,nofollow"> </head>

Note: If the page has been indexed in the past, and you add the <meta> tag, it will be removed from the index the next time Google crawls the page.

For more information on robots in general, see What are web search robots, and how do they affect me?

Site ranking

With Google's KeyMatch feature, you can improve the ranking of your pages so that your documents will appear at the top of the search results. KeyMatch allows you to select a list of keywords associated with a particular page. When a user types in those keywords, the document will appear at the top of the results page.

This is document atcf in domain all.
Last modified on September 22, 2009.

Comments/Questions/Corrections

Use this form to offer suggestions, corrections, and additions to the Knowledge Base. We welcome your input!

If you are affiliated with Indiana University and would like assistance with a specific computing problem, please use the Ask a Consultant form, or contact your campus Support Center.

Contact Information

Note: We will reply to your comment at this address. If your message concerns a problem receiving email, please enter an alternate email address.