About the Scholarly Data Archive (SDA) at Indiana University

On this page:


System overview

The Indiana University Scholarly Data Archive (SDA) provides extensive capacity (approximately 79 PB of tape overall) for storing and accessing research data. The SDA is a distributed storage service co-located at IU data centers in Bloomington and Indianapolis, providing IU researchers with large-scale archival or near-line data storage, arranged in large files, with two copies of data made by default (for disaster recovery).

The SDA is based on the High Performance Storage System (HPSS), a consortium-developed hierarchical storage management (HSM) package that makes the SDA's hierarchy of storage media transparent to its users. The SDA's system architecture comprises fast, efficient disk cache front-end components that move infrequently accessed data to two high-end tape libraries. Using the I-Light high performance network between IUB and IUPUI, the SDA creates two tape copies of user data simultaneously (one at each data center), adding a degree of disaster tolerance to both sites.

The Scholarly Data Archive is well suited for storing large volumes of data (that is, tens of gigabytes to several terabytes per project), and data that are accessed relatively infrequently (archival or near-line storage). The SDA's tape-based backend is not designed for storing a large number of small files. The SDA performs best with larger files; storing and retrieving many small files can negatively impact the SDA's performance. Individual files should be at least 1 MB. If you need to store many small files on the SDA, use a file compression utility (for example, gzip, tar, or 7-Zip) to bundle your files into a single, large archive file.

The SDA supports high performance access methods, such as the Hierarchical Storage Interface (HSI); an HPSS API is available for programmers, as well.

Important:

Before storing data on any of Indiana University's research computing or storage systems, make sure you understand the information in Types of sensitive institutional data appropriate for UITS Research Technologies services.

Make sure you do not include sensitive institutional data as part of a file's filename or pathname.

System information

  • Machine type: Distributed HPSS data archive
  • Operating system: Red Hat Enterprise Linux 7
  • Potential tape library capacity: 79 PB
  • Total disk capacity (cache): 1,800 TB
  • Aggregate I/O: 80 Gbps
  • Backup and purge policies: Data are not backed up; the system is never purged as long as the account owner has a valid IU account.
    Note:

    Due to the mammoth volume of data stored on the Scholarly Data Archive (SDA), back-ups are neither practical nor economical. The SDA doesn't have an offline, traditional backup system, so any deletion you perform will cause an irreversible loss of data.

    However, to protect against random tape errors, two tape copies of the data are created by default. If one tape fails, data can be retrieved from the other tape. The two copies of data reside at two geographically distant sites (at IU Bloomington and at IUPUI) in two separate tape libraries. Also, two separate metadata backups are performed, at IUB and IUPUI. As a result, even in the event of a catastrophic disaster affecting either the IUB or IUPUI site (such as a fire or a tornado), all dual-copy data on the SDA would still be safe.

  • Quotas: 50 TB (default) per user or project; when in support of research activities, extensions beyond 50 TB may be granted for a nominal charge. If you need more than 50 TB, submit your request using the SDA Quota Increase Request form.

    On October 6, 2019, UITS Research Technologies implemented a file quota to limit the number of files users can store in their Scholarly Data Archive accounts. The file quota for new accounts is 25,000 files.

  • Directory path and file name limits: On the SDA, HPSS limits the length of directory paths and file names as follows:
    • Directory paths: The directory path of any file may not exceed 1,024 characters in length.
    • File names: File names may not exceed 256 characters in length.
    Note:
    • File name and directory path limits in HPSS are separate from (and less restrictive than) similar limits imposed when using HTAR; for more, see HTAR limitations in Use HTAR with your SDA account.
    • File and directory names within the SDA may only contain ASCII characters in the range 0x20 to 0x7e. SDA does not support Unicode and similar encodings; rarely, you may need to rename external files before transferring them into the SDA.

Work with data containing PHI

The Health Insurance Portability and Accountability Act of 1996 (HIPAA) established rules protecting the privacy and security of individually identifiable health information. The HIPAA Privacy Rule and Security Rule set national standards requiring organizations and individuals to implement certain administrative, physical, and technical safeguards to maintain the confidentiality, integrity, and availability of protected health information (PHI).

This UITS system or service meets certain requirements established in the HIPAA Security Rule thereby enabling its use for work involving data that contain protected health information (PHI). However, using this system or service does not fulfill your legal responsibilities for protecting the privacy and security of data that contain PHI. You may use this system or service for work involving data that contain PHI only if you institute additional administrative, physical, and technical safeguards that complement those UITS already has in place.

For more, see Your legal responsibilities for protecting data containing protected health information (PHI) when using UITS Research Technologies systems and services.

Note:

Although PHI is classified as Critical data, other types of institutional data classified as Critical are not permitted on Research Technologies systems. For help determining which institutional data elements classified as Critical are considered PHI, see About protected health information (PHI) data elements in the classifications of institutional data.

If you have questions about securing HIPAA-regulated research data at IU, email securemyresearch@iu.edu. SecureMyResearch provides self-service resources and one-on-one consulting to help IU researchers, faculty, and staff meet cybersecurity and compliance requirements for processing, storing, and sharing regulated and unregulated research data; for more, see About SecureMyResearch. To learn more about properly ensuring the safe handling of PHI on UITS systems, see the UITS IT Training video Securing HIPAA Workflows on UITS Systems. To learn about division of responsibilities for securing PHI, see Shared responsibility model for securing PHI on UITS systems.

Request an account

For eligibility requirements, see the "Research system accounts (all campuses)" section in Computing accounts at IU.

After submitting your account request, UITS will notify you via email when your account is ready for use.

Note:
In accordance with standards for access control mandated by the HIPAA Security Rule, you are not permitted to access data containing protected health information (PHI) using a group (or departmental) account. To ensure accountability and maintain appropriate levels of access control, all users must use an individual login for all work involving PHI.

Access the SDA and transfer files

Once you have an SDA account, you can access it from any networked host. The method you use depends on your operating system and level of comfort with the command-line interface.

Note:
  • The SDA is offline for regularly scheduled maintenance every Sunday 7am-10am.
  • To access the SDA from off campus, UITS recommends protocols such as Globus and SFTP, which provide IU Login and Two-Step Login (Duo) authentication, as well as encryption in transit. Because the HSI/HTAR protocols don't provide the same protections, to remotely access the SDA with a local HSI/HTAR client you must first email store-admin@iu.edu to request an off-campus exemption.

Methods available for transferring data to and from the Indiana University Scholarly Data Archive (SDA) include secure FTP (SFTP), secure copy (SCP), GridFTP (via the IU Globus Web App), and Hierarchical Storage Interface (HSI). For instructions, see:

Important:

Files containing PHI must be encrypted when they are stored (at rest) and when they are transferred between networked systems (in transit). To ensure that files containing PHI are encrypted when they are stored, encrypt them before transferring them to storage. To ensure that files containing PHI remain encrypted during transit, use SFTP/SCP or the IU Globus Web App. For more, see Recommended tools for encrypting data containing HIPAA-regulated PHI.

HSI, the highest performing non-grid method, provides shell-like facilities for recursive operations, and can take input data from standard input. HSI also can perform file migration to tape, stage files from tape to disk, and purge files from the disk cache. HSI is available on UITS research supercomputers when you load the hpss module. For more about HSI, see the HSI Reference Manual.

For use on personal workstations, IU SDA users can download and install HSI version 8.3.3 (bundled with its companion program, HTAR) from the UITS Research Technologies HSI folder in Google at IU My Drive. (You must be signed into your Google at IU account to access this folder; see Access Google at IU.) Bundles are available for 32- and 64-bit Windows, macOS, and Red Hat Enterprise Linux, and for 64-bit Ubuntu Linux.

Note:
  • To connect to the SDA with a local HSI/HTAR client, make sure you have version 8.3.3 installed.
  • To connect to the SDA with a local HSI/HTAR client from an off-campus network location, you must first email store-admin@iu.edu to request an off-campus exemption.

For Windows or macOS users who prefer a graphical interface, UITS recommends using a graphical SFTP client. For macOS users, particularly those needing to transfer large amounts of data, UITS recommends Fetch.

Access the SDA in Research Desktop (RED)

You can access the SDA in Research Desktop (RED) from the ThinLinc Client by clicking Applications > Storage > Scholarly Data Archive.

In the ThinLinc Client, access SDA in Research Desktop (RED)

Acknowledge grant support

The Indiana University cyberinfrastructure, managed by the Research Technologies division of UITS, is supported by funding from several grants, each of which requires you to acknowledge its support in all presentations and published works stemming from research it has helped to fund. Conscientious acknowledgment of support from past grants also enhances the chances of IU's research community securing funding from grants in the future. For the acknowledgment statement(s) required for scholarly printed works, web pages, talks, online publications, and other presentations that make use of this and/or other grant-funded systems at IU, see Sources of funding to acknowledge in published work if you use IU's research cyberinfrastructure.

Support

The SDA is maintained by the UITS Research Storage team. If you have questions or need help, contact UITS Research Storage (store-admin@iu.edu).

This is document aiyi in the Knowledge Base.
Last modified on 2024-02-14 12:35:13.