Archive directories of many small files to the SDA

On this page:


Overview

The Scholarly Data Archive is well suited for storing large volumes of data (that is, tens of gigabytes to several terabytes per project), and data that are accessed relatively infrequently (archival or near-line storage). The SDA's tape-based backend is not designed for storing a large number of small files. The SDA performs best with larger files; storing and retrieving many small files can negatively impact the SDA's performance. Individual files should be at least 1 MB. If you need to store many small files on the SDA, use a file compression utility (for example, gzip, tar, or 7-Zip) to bundle your files into a single, large archive file.

Follow the instructions below to archive directories of many small files on your workstation and use the IU Globus Web App to automatically copy those archives on a recurring schedule to your space on the Scholarly Data Archive (SDA). This recurring transfer copies new or changed files from the archives on your workstation to the archives stored in your SDA space. It also removes files from the archives on the SDA when you remove those files from the archives on your workstation.

Before you begin

Important:

Before storing data on any of Indiana University's research computing or storage systems, make sure you understand the information in Types of sensitive institutional data appropriate for UITS Research Technologies services.

Make sure you do not include sensitive institutional data as part of a file's filename or pathname.

Set up a recurring transfer

  1. On your workstation, create a directory for storing archived files.
  2. On your workstation, create a scheduled job that archives a directory of files and stores the resulting .tar or .zip file in the archived files directory you created in the first step.
  3. Launch Globus Connect Personal on your workstation, and then, in a browser, log in to the IU Globus Web App.
  4. Select File Manager and, in the top right, next to "Panels", select the middle (set two pane) option.
  5. Find and activate your workstation collection and the SDA collection (IURT - Scholarly Data Archive); for help, see Activate your collections.
  6. In your workstation collection, find and select the archived files directory you created in the first step.
  7. Between the Start buttons, select Transfer & Timer Options, and then:
    • Label This Transfer: Enter a descriptive label to identify your recurring transfer.
    • Transfer Settings: Select the following:
      • sync - only transfer new or changed files where the checksum is different
      • delete files on destination that do not exist on source
      • preserve source file modification times
      • Skip files on source with errors
      • Fail on quota errors
    • Schedule Start: Enter a valid date and time.
    • Repeat: Use the drop-down to select how frequently the transfer should run and when the recurrences should end.
  8. To submit the transfer request, select Start. (If prompted to allow the IU Globus Web App to operate Globus Timer, select Allow.) You should see a "Timer request submitted successfully" message; optionally, select View details to see an overview of your transfer and the timer log.
    Note:
    If your workstation is in sleep mode when your transfer is scheduled to run, the transfer will fail, and you will receive an email notifying you of the failed transfer.

Get help

For help with IU Globus Web App data transfers or the SDA, email the UITS Research Storage team (store-admin@iu.edu).

This is document biia in the Knowledge Base.
Last modified on 2023-10-03 09:54:53.