Best uses for an IU SDA account

On this page:


Use cases

Important:
Before storing data on the SDA, make sure you understand the information in Types of sensitive institutional data appropriate for the Scholarly Data Archive at IU.

The Scholarly Data Archive (SDA) at Indiana University is a tape-based HPSS system primarily intended for archival storage. It is best suited for storing large files (maximum file size is 10 TB), storing read-only files and infrequently accessed files, and archiving research data.

Following are some common use cases that illustrate effective use of the SDA tape system:

  • Code repositories: A typical code repository consists of a large number of relatively small files that almost always are stored and retrieved together as a single unit.

    To store a code repository on the SDA, use a file compression utility (e.g., TAR, ZIP, or GZIP) to create a single archive file, and then transfer the archive file to the SDA using either:

    Another convenient method is to use HTAR, a utility available only for Linux environments that simultaneously creates archive files and transfers them to the SDA.

    Using archive files to store code repositories is advantageous in two ways:

    • Storage and retrieval operations performed on a single, large archive file take less time to complete than those performed on many small files.
    • By storing your code in a single archive file, you are less likely to lose or omit small individual code files.
  • Data collections: A data collection (e.g., field-work results and experimental measurements) may contain numerous files of widely differing sizes. A data collection comprising 100 or more files should be stored in one or more archive files, depending on how you intend to retrieve your data in the future:
    • If you intend to retrieve the collection in its entirety, you can store it as a single archive file. Just as it does for code repositories, using an archive file will provide faster storage and retrieval operations on your, and minimize the risk of human error.
    • If you intend to retrieve one or more distinct subsections of your collection, consider storing them in separate archive files; this will minimize the retrieval of unneeded files while still reducing the number of individually stored files.

    Even if you intend to occasionally retrieve individual files from your collection, you still should use HTAR to create one or more archive files. Because HTAR creates an index for each .tar archive it creates, you can use it to retrieve individual files from your archive without having to download and extract the entire archive. Using HTAR to access individual files stored in archives places less stress on the SDA tape system than storing and retrieving a large number of individual files.

  • Storing large individual files: Large individual files (e.g., video files) may be stored and retrieved using any of the methods discussed above, whether they are compressed in archives or not. Most graphics and video files already are compressed, and further compression usually does not reduce their size by very much.
Note:
If you need to maintain a large collection of files, and you intend to retrieve individual files on a frequent basis, you should rule out using the SDA; it is not appropriate for that purpose. Retrieving multiple small individual files is a time-consuming process that places stress on the SDA's robotic tape library, which must retrieve, mount, and spin back and forth through multiple tape cartridges. With the high speed at which the tape moves, reading individual small chunks of data can cause mechanical overshooting and backtracking that slows down the retrieval operation.

General guidelines

Following are general guidelines for effectively using your storage space on the SDA:

  • As often as possible, provide contextual information about your data collections. Include at least a simple README file to indicate the date (or date range) of your collection, its origin and the method(s) used to collect the data, the individuals responsible for it, and any associated grant numbers, as well as any other pertinent information.
  • Files containing PHI must be encrypted when they are stored (i.e., at rest) and when they are transferred between networked systems (i.e., in transit). Do not use HSI, HTAR, or Samba to transfer data containing PHI unless those data are encrypted already; HSI, HTAR, and Samba do not encrypt data during transit. To ensure that files containing PHI remain encrypted during transit, use SFTP/SCP or the IU Globus Web App. To ensure that files containing PHI are encrypted when they are stored on the SDA, encrypt them before transferring them. For more, see Recommended tools for encrypting data containing HIPAA-regulated PHI.
  • Avoid using spaces or quotation marks in file names; these characters are problematic for some of the SDA's administrative tools. File names with these characters are acceptable for files stored within archive files.
  • Verify the success of your file transfers. Check the number of transferred files and their sizes. For a higher degree of assurance, use HSI's checksum feature (however, be aware this will reduce the speed of your transfer somewhat).

Get help

If you have questions about how to best use your SDA account, or need help determining what storage solution is best suited to meet your particular needs, contact the UITS Research Storage team.

This is document ahyi in the Knowledge Base.
Last modified on 2018-10-22 16:34:37.

Contact us

For help or to comment, email the UITS Support Center.