Indexing and ordering in the IU Search service

On this page:


Overview

Google Custom Search (GCS), a service implemented on all Indiana University campuses, automatically finds websites by crawling through official IU websites. Only those sites and pages specified in your custom search engine's control panel will be indexed. This search service will not affect the way other internet search engines access your site. For a variety of reasons, the search engine may not find every page in every domain. If you find that your site does not appear in the search results, report the missing site to Enterprise Web Tech Services.

For more, see About Google Custom Search.

Index your site

If you are using Google Custom Search and wish to take some control over the search engine's indexing, log into Google Custom Search Engine. Select Control panel next to the name of your search engine, and then choose Indexing from the column on the left. From there, you may add specific URLs to index or a sitemap URL.

For your pages to be indexed by Google, make sure your pages don't contain meta tags that prevent the robot from indexing the pages. If your site meets that requirement, there is no need to submit the site to the index; the Google crawler will pick up changed, new, and removed pages automatically.

Prevent searches on your site

If you wish to prevent searches on your site, log into Google Custom Search Engine. Select Control panel next to the name of your search engine, and then choose Sites from the column on the left. Under "Excluded sites" at the bottom of the page, add the URLs of sites that you do not want to search.

If you wish to exclude an entire web server from the search engine, insert a robots.txt file at the top level of your site. The robots.txt file should contain something like the following:

  User-agent: *
  Disallow: /

If you wish to restrict the search engine from indexing a particular directory or page, insert the following <meta> tag into your page's header tag:

  <head>
  <meta name="robots" content="noindex,nofollow">
  </head>

If the page has been indexed in the past, and you add the <meta> tag, it will be removed from the index the next time Google crawls the page.

For more on robots in general, see Web search robots.

This is document atcf in the Knowledge Base.
Last modified on 2022-09-30 16:33:00.