How data integrity is assured on the SDA at IU

Important:

Before storing data on any of Indiana University's research computing or storage systems, make sure you understand the information in Types of sensitive institutional data appropriate for UITS Research Technologies services.

Make sure you do not include sensitive institutional data as part of a file's filename or pathname.

Indiana University's HPSS system, the Scholarly Data Archive (SDA), is designed to be a data archive, with the expectation that data stored there will be good, long-term copies of the original data. To help keep data stable, both the transfer mechanism (TCP/IP) and the storage media (disk and tape) have checksums, and the storage media also have error correction mechanisms to deal with small media defects and bit loss.

However, there can be no guarantee that the data will remain intact forever, or that data corruption will not occur. Even very low error rates are non-zero, and when massive amounts of data are transferred, undetected errors may occur. Error correction schemes also can fail when data are stored on magnetic media for long periods of time.

IU periodically rewrites data from one storage medium to another. Disk-to-tape transfers are nearly immediate, and tape-to-tape transfers are done when other data on the tape are deleted, or when new tape technology is implemented. This protects against some errors stemming from long-term storage.

Once the data are in HPSS, to ensure integrity IU relies on the checksumming and error correction capabilities of the TCP/IP protocol and the storage media. If errors occur, a copy is available to help recover data.

To check data integrity further, you can run a checksumming algorithm (for example, MD5) on a file. With HSI clients version 4 and higher, checksums are computed by default when you transfer files to the SDA. With HSI, you also can create and view checksums for files already stored on the SDA; see Use HSI to create and manage checksums. Alternatively, you can download the file and perform the checksum locally.

This is document awax in the Knowledge Base.
Last modified on 2023-08-16 14:38:01.