The Scholarly Data Archive (SDA) at Indiana University

On this page:

System overview

The Indiana University Scholarly Data Archive (SDA) provides extensive capacity (approximately 42 PB of tape overall) for storing and accessing research data. The SDA is a distributed storage service co-located at IU data centers in Bloomington and Indianapolis, providing IU researchers with large-scale archival or near-line data storage, arranged in large files, with two copies of data made by default (for disaster recovery).

The SDA is based on the High Performance Storage System (HPSS), a consortium-developed hierarchical storage management (HSM) package that makes the SDA's hierarchy of storage media transparent to its users. The SDA's system architecture comprises fast, efficient disk cache front-end components (with a capacity of roughly 1,800 TB) that move infrequently accessed data to two high-end tape libraries (with nearly 15 PB of capacity). Using the I-Light high-performance network between IUB and IUPUI, the SDA creates two tape copies of user data simultaneously (one at each data center), adding a degree of disaster tolerance to both sites.

The SDA is well suited for storing large volumes of data (i.e., tens of gigabytes to several terabytes per project), and data that are accessed relatively infrequently (i.e., archival or near-line storage). The SDA backend is not designed for storing a large number of small files. Individual files should be at least 1 MB. If you need to store many small files on the SDA, use a file compression utility (e.g., gzip, tar, or zip) to bundle your files into a single, large archive file.

The SDA supports high-performance access methods, such as the Hierarchical Storage Interface (HSI); an HPSS API is available for programmers, as well.

Note: At IU, the initials SDA, MDSS, and HPSS are often used interchangeably to describe the same service.

System information

Note: The SDA is offline for regularly scheduled maintenance every Sunday 7am-10am.

System configuration
Machine type Distributed HPSS data archive
Operating system Red Hat Enterprise Linux 6
Storage information
Network file system protocols
HSI/HTAR, CIFS (Samba) as read-only, SFTP/SCP, HTTPS
Usable tape capacity 15 PB
Total disk capacity (cache) 1800 TB
50 TB (default) per user, 50 TB (default) per project; increases as needed
Backup and purge policies
Dual copies of data, but no backups; system is never purged
Aggregate I/O 80 Gbps

Back to top

Work with data containing PHI

The Health Insurance Portability and Accountability Act of 1996 (HIPAA) established rules protecting the privacy and security of individually identifiable health information. The HIPAA Privacy Rule and Security Rule set national standards requiring organizations and individuals to implement certain administrative, physical, and technical safeguards to maintain the confidentiality, integrity, and availability of protected health information (PHI).

This UITS system or service meets certain requirements established in the HIPAA Security Rule thereby enabling its use for work involving data that contain protected health information (PHI). However, using this system or service does not fulfill your legal responsibilities for protecting the privacy and security of data that contain PHI. You may use this system or service for work involving data that contain PHI only if you institute additional administrative, physical, and technical safeguards that complement those UITS already has in place.

Although PHI is one type of institutional data classified as Critical at IU, other types of institutional data classified as Critical are not permitted on Research Technologies systems. For help determining which institutional data elements classified as Critical are considered PHI, see About protected health information (PHI) data elements in the classifications of institutional data .

For more, see Your legal responsibilities for protecting data containing protected health information (PHI) when using UITS Research Technologies systems and services.

UITS provides consulting and online help for Indiana University researchers, faculty, and staff who need help securely processing, storing, and sharing data containing protected health information (PHI). If you have questions about managing HIPAA-regulated data at IU, contact UITS HIPAA Consulting. To learn more about properly ensuring the safe handling of PHI on UITS systems, see the UITS IT Training video Securing HIPAA Workflows on UITS Systems. For additional details about HIPAA compliance at IU, see HIPAA Privacy & Security on the University Compliance website.

Back to top

Request an account

For eligibility requirements, see the "Research system accounts (all campuses)" section in Computing accounts at IU.

  • To request an individual (personal) account on the Indiana University Scholarly Data Archive (SDA), follow the instructions in Get additional IU computing accounts.

    If you are eligible to request an SDA account, but it is not listed among the accounts available to request, contact your campus Support Center for help.

  • To request to have an SDA account created for your IU group account, contact the Research Storage team.
    In accordance with standards for access control mandated by the HIPAA Security Rule, you are not permitted to access data containing protected health information (PHI) using a group (or departmental) account. To ensure accountability and maintain appropriate levels of access control, all users must use an individual login for all work involving PHI.

After submitting your account request, UITS will notify you via email when your account is ready for use.

Back to top

Access the SDA and transfer files

Once you have an SDA account, you can access it from any networked host. The method you use depends on your operating system and level of comfort with the command-line interface.

Methods available for transferring data to and from the Indiana University Scholarly Data Archive (SDA) include Hierarchical Storage Interface (HSI), secure FTP (SFTP), secure copy (SCP), and GridFTP. For instructions, see:

Read-only access is available via CIFS/Samba; see Use Samba or CIFS to access your SDA account from your personal workstation.

Files containing PHI must be encrypted when they are stored (i.e., at rest) and when they are transferred between networked systems (i.e., in transit). Do not use HSI, HTAR, or Samba to transfer data containing PHI unless those data are encrypted already; HSI, HTAR, and Samba do not encrypt data during transit. To ensure that files containing PHI remain encrypted during transit, use SFTP/SCP or the IU Globus Web App. To ensure that files containing PHI are encrypted when they are stored on the SDA, encrypt them before transferring them. For more, see Recommended tools for encrypting data containing HIPAA-regulated PHI.

HSI, the highest performing non-grid method, provides shell-like facilities for recursive operations, and can take input data from standard input. HSI also can perform file migration to tape, stage files from tape to disk, and purge files from the disk cache.

HSI is available on UITS research computing systems when you load the hpss module; for more about Modules, see On the research computing systems at IU, how do I use Modules to manage my software environment?

For use on personal workstations, IU SDA users can download and install HSI (bundled with its companion program, HTAR) from the UITS Research Storage HSI folder in Box. Bundles are available for 32- and 64-bit Windows, OS X, and Red Hat Enterprise Linux, and for 64-bit Ubuntu Linux.

For Windows or OS X users who prefer a graphical interface, UITS recommends using a graphical SFTP client. For OS X users, particularly those needing to transfer large amounts of data, UITS recommends Fetch (available from IUware).


See On the Scholarly Data Archive at IU, what are classes of service, and how do I use them?

Acknowledge grant support

The Indiana University cyberinfrastructure, managed by the Research Technologies division of UITS, is supported by funding from several grants, each of which requires you to acknowledge its support in all presentations and published works stemming from research it has helped to fund. Conscientious acknowledgment of support from past grants also enhances the chances of IU's research community securing funding from grants in the future. For the acknowledgment statement(s) required for scholarly printed works, web pages, talks, online publications, and other presentations that make use of this and/or other grant-funded systems at IU, see If I use IU's research cyberinfrastructure, what sources of funding do I need to acknowledge in my published work?

Back to top


The SDA is maintained by the UITS Research Storage team. If you have questions or need help, contact UITS Research Storage.

This is document aiyi in the Knowledge Base.
Last modified on 2018-11-07 12:19:41.

Contact us

For help or to comment, email the UITS Support Center.