Indiana University
University Information Technology Services
  
What are archived documents?
Login>>
Login

Login is for authorized groups (e.g., UITS, OVPIT, and TCC) that need access to specialized Knowledge Base documents. Otherwise, simply use the Knowledge Base without logging in.

Close

On IU's HPSS system, how is data integrity assured?

Indiana University's HPSS system, the Scholarly Data Archive (SDA), is designed to be a data archive, with the expectation that data stored there will be good, long-term copies of the original data. To help keep data stable, both the transfer mechanism (TCP/IP) and the storage media (disk and tape) have checksums, and the storage media also have error correction mechanisms to deal with small media defects and bit loss.

However, there can be no guarantee that the data will remain intact forever, or that data corruption will not occur. Even very low error rates are non-zero, and when massive amounts of data are transferred, undetected errors may occur. Error correction schemes can also fail when data are stored on magnetic media for long periods of time.

IU periodically rewrites data from one storage medium to another. Disk to tape transfers are nearly immediate, and tape to tape transfers are done when other information on the tape is deleted, or when new tape technology is implemented. This protects against some errors stemming from long-term storage.

Once the data are in HPSS, IU relies on the checksumming and error correction capabilities of the TCP/IP protocol and the storage media to ensure integrity. If errors occur, a second copy is available to help recover data.

If you want further integrity checks, you will need to perform them manually. One common method is to run a checksumming algorithm, such as MD5, on a file. You can do this either by downloading the file and performing the checksum, or by remotely executing the checksum on IU's servers (see How do I remotely execute a checksum on my IU HPSS data?).

This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

This is document awax in domain all.
Last modified on July 28, 2011.

Comments/Questions/Corrections

Use this form to offer suggestions, corrections, and additions to the Knowledge Base. We welcome your input!

If you are affiliated with Indiana University and would like assistance with a specific computing problem, please use the Ask a Consultant form, or contact your campus Support Center.

Contact Information

Note: We will reply to your comment at this address. If your message concerns a problem receiving email, please enter an alternate email address.