In IU's HPSS system, how is data integrity assured?
Indiana University's HPSS system is designed to be a data archive, with the expectation that data stored there will be good, long-term copies of the original data. To help keep data stable, both the transfer mechanism (TCP/IP) and the storage media (disk and tape) have checksums, and the storage media also have error correction mechanisms to deal with small media defects and bit loss.
However, there can be no guarantee that the data will remain intact forever, or that data corruption will not occur. Even very low error rates are non-zero, and when massive amounts of data are transferred, undetected errors may occur. Error correction schemes can also fail when data are stored on magnetic media for long periods of time.
IU periodically rewrites data from one storage medium to another. Disk to tape transfers are nearly immediate, and tape to tape transfers are done when other information on the tape is deleted, or when new tape technology is implemented. This protects against some errors stemming from long-term storage.
Once the data are in HPSS, IU relies on the checksumming and error correction capabilities of the TCP/IP protocol and the storage media to ensure integrity. If errors occur, a second copy is available to help recover data.
If you want further integrity checks, you will need to perform them manually. One common method is to run a checksumming algorithm, such as MD5, on a file. You can do this either by downloading the file and performing the checksum, or by remotely executing the checksum on IU's servers.
Using GridFTP's cksm command to remotely execute a
checksum
To execute a checksum remotely using the GridFTP server's
cksm command, your client must support sending this
command. For example, UberFTP does not support this command directly,
but it does support sending unknown commands using its quote command:
Note that in the example, the length of -1 indicates that
the entire file should be checksummed, and not just a slice.
Using MDSSWeb to remotely execute a checksum
Note: The following describes an experimental feature of MDSSWeb. To use it, you need to begin your MDSSWeb session at the experimental (beta) MDSSWeb interface:
https://www.mdss.iu.edu/betaYou can compute MD5 and SHA1 checksums with MDSSWeb. To execute a checksum remotely:
- In the listing, check the boxes to select the files and
directories for which you would like to execute a checksum.
- In the MDSSWeb toolkit, above the listing, click
Checksum.
- Fill in any other desired options, and then click
Perform Checksum.
For details on advanced options, from the checksum utility interface,
click Help.
Once the checksum has been computed for a file, it will be stored
and displayed in MDSSWeb below the file name. The GridFTP server does
not query or update the stored checksum for its cksm
command.
No method currently exists to select which copy of a file to check or download: the copy on the first tape, the copy on the second tape, or a copy remaining on disk from a previous transfer.
If an error is found, email the Research Storage group immediately, so staff can try to determine why there is a problem, and attempt to recover the data from the second copy.
It's always a good idea to add a bit of redundancy to your own data so that in case there is a failure, data can still be recovered.
This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Last modified on May 13, 2009.







