Best uses for an IU SDA account

On this page:


Use cases

Important:

Before storing data on any of Indiana University's research computing or storage systems, make sure you understand the information in Types of sensitive institutional data appropriate for UITS Research Technologies services.

Make sure you do not include sensitive institutional data as part of a file's filename or pathname.

The Scholarly Data Archive (SDA) at Indiana University is a tape-based HPSS system primarily intended for archival storage. It is best suited for storing large files (maximum file size is 10 TB), storing read-only files and infrequently accessed files, and archiving research data.

Following are some common use cases that illustrate effective use of the SDA tape system:

  • Code repositories: A typical code repository consists of a large number of relatively small files that almost always are stored and retrieved together as a single unit.

    To store a code repository on the SDA, use a file compression utility (for example, tar, ZIP, or GZip) to create a single archive file, and then transfer the archive file to the SDA using either:

    Another convenient method is to use HTAR, a utility available only for Linux environments that simultaneously creates archive files and transfers them to the SDA.

    Using archive files to store code repositories is advantageous in two ways:

    • Storage and retrieval operations performed on a single, large archive file take less time to complete than those performed on many small files.
    • By storing your code in a single archive file, you are less likely to lose or omit small individual code files.
  • Data collections: A data collection (for example, field-work results and experimental measurements) may contain numerous files of widely differing sizes. A data collection comprising 100 or more files should be stored in one or more archive files, depending on how you intend to retrieve your data in the future:
    • If you intend to retrieve the collection in its entirety, you can store it as a single archive file. Just as it does for code repositories, using an archive file will provide faster storage and retrieval operations on your, and minimize the risk of human error.
    • If you intend to retrieve one or more distinct subsections of your collection, consider storing them in separate archive files; this will minimize the retrieval of unneeded files while still reducing the number of individually stored files.

    Even if you intend to occasionally retrieve individual files from your collection, you still should use HTAR to create one or more archive files. Because HTAR creates an index for each .tar archive it creates, you can use it to retrieve individual files from your archive without having to download and extract the entire archive. Using HTAR to access individual files stored in archives places less stress on the SDA tape system than storing and retrieving a large number of individual files.

  • Storing large individual files: Large individual files (for example, video files) may be stored and retrieved using any of the methods discussed above, whether they are compressed in archives or not. Most graphics and video files already are compressed, and further compression usually does not reduce their size by very much.

General guidelines

Following are general guidelines for effectively using your storage space on the SDA:

  • As often as possible, provide contextual information about your data collections. Include at least a simple README file to indicate the date (or date range) of your collection, its origin and the method(s) used to collect the data, the individuals responsible for it, and any associated grant numbers, as well as any other pertinent information.
  • Files containing PHI must be encrypted when they are stored (at rest) and when they are transferred between networked systems (in transit). To ensure that files containing PHI are encrypted when they are stored, encrypt them before transferring them to storage. To ensure that files containing PHI remain encrypted during transit, use SFTP/SCP or the IU Globus Web App. For more, see Recommended tools for encrypting data containing HIPAA-regulated PHI.

  • Avoid using spaces or quotation marks in file names; these characters are problematic for some of the SDA's administrative tools. File names with these characters are acceptable for files stored within archive files.
  • Verify the success of your file transfers. Check the number of transferred files and their sizes. For a higher degree of assurance, use HSI's checksum feature (however, be aware this will reduce the speed of your transfer somewhat).
  • If you need to maintain a large collection of files and intend to retrieve individual files on a frequent basis, you should rule out using the SDA, because the system is not appropriate for such purposes. The retrieval of multiple small individual files and their metadata is a time-consuming process that overtaxes the SDA's robotic tape library as it retrieves, mounts, and reads multiple tape cartridges. With the high speed at which the tape moves, mechanical overshooting and backtracking can occur when retrieving small individual chunks of data. This slows down the retrieval operation and is detrimental to overall system performance.
  • On October 6, 2019, UITS Research Technologies implemented a file quota to limit the number of files users can store in their Scholarly Data Archive accounts. The file quota for new accounts is 25,000 files.

Get help

If you have questions about how to best use your SDA account, or need help determining what storage solution is best suited to meet your particular needs, email the UITS Research Storage team (store-admin@iu.edu).

This is document ahyi in the Knowledge Base.
Last modified on 2023-10-03 09:54:21.