Archive directories of many small files to the SDA
On this page:
Overview
The Scholarly Data Archive is well suited for storing large volumes of data (that is, tens of gigabytes to several terabytes per project), and data that are accessed relatively infrequently (archival or near-line storage). The SDA's tape-based backend is not designed for storing a large number of small files. The SDA performs best with larger files; storing and retrieving many small files can negatively impact the SDA's performance. Individual files should be at least 1 MB. If you need to store many small files on the SDA, use a file compression utility (for example, gzip, tar, or 7-Zip) to bundle your files into a single, large archive file.
Follow the instructions below to archive directories of many small files on your workstation and use the IU Globus Web App to automatically copy those archives on a recurring schedule to your space on the Scholarly Data Archive (SDA). This recurring transfer copies new or changed files from the archives on your workstation to the archives stored in your SDA space. It also removes files from the archives on the SDA when you remove those files from the archives on your workstation.
Before you begin
Before storing data on any of Indiana University's research computing or storage systems, make sure you understand the information in Types of sensitive institutional data appropriate for UITS Research Technologies services.
Make sure you do not include sensitive institutional data as part of a file's filename or pathname.
- To use a UITS Research Technologies system as a collection, you must have an account on that system. For eligibility and account creation information, see the Research system accounts (all campuses) section of Computing accounts at IU.
- For instructions on how to access the IU Globus Web App, add your workstation as a collection, and activate your collections, see Use the IU Globus Web App to transfer data to and from your accounts on IU's research computing and storage systems.
Set up a recurring transfer
- On your workstation, create a directory for storing archived files.
- On your workstation, create a scheduled job that archives a directory of files and stores the resulting
.tar
or.zip
file in the archived files directory you created in the first step. - Launch Globus Connect Personal on your workstation, and then, in a browser, log in to the IU Globus Web App.
- Select and, in the top right, next to "Panels", select the middle ( ) option.
- Find and activate your workstation collection and the SDA collection (Activate your collections. ); for help, see
- In your workstation collection, find and select the archived files directory you created in the first step.
- Between the
- Label This Transfer: Enter a descriptive label to identify your recurring transfer.
- Transfer Settings: Select the following:
- where the
- Schedule Start: Enter a valid date and time.
- Repeat: Use the drop-down to select how frequently the transfer should run and when the recurrences should end.
buttons, select , and then:
- To submit the transfer request, select Note:If your workstation is in sleep mode when your transfer is scheduled to run, the transfer will fail, and you will receive an email notifying you of the failed transfer.
. (If prompted to allow the IU Globus Web App to operate Globus Timer, select .) You should see a "Timer request submitted successfully" message; optionally, select to see an overview of your transfer and the timer log.
Get help
For help with IU Globus Web App data transfers or the SDA, email the UITS Research Storage team (store-admin@iu.edu
).
This is document biia in the Knowledge Base.
Last modified on 2023-10-03 09:54:53.