UITS

The Scholarly Data Archive (SDA) at Indiana University

On this page:


System overview

The Indiana University Scholarly Data Archive (SDA) provides extensive capacity (approximately 42 PB of tape overall) for storing and accessing research data. The SDA is a distributed storage service co-located at IU data centers in Bloomington and Indianapolis, providing IU researchers with large-scale archival or near-line data storage, arranged in large files, with automatic off-site copies of data for disaster recovery.

Access is available to IU graduate students, faculty, and staff. Undergraduates and non-IU collaborators must have IU faculty sponsors. For details, see the "Research system accounts (all campuses)" section of What computing accounts are available at IU, and for whom?

The SDA supports high-performance access methods, such as the Hierarchical Storage Interface (HSI); an HPSS API is available for programmers, as well.

The SDA uses the consortium-developed High Performance Storage System (HPSS), a hierarchical storage management (HSM) software package that makes transparent to its users a hierarchy of storage media. This hierarchy, which produces the SDA's massive storage capacity, includes disk caches (totaling roughly 1,800 TB) back-ending into two high-end tape libraries that provide a total usable tape capacity of nearly 15 PB (uncompressed). This near-line, tape-based storage system, mediated by fast, efficient disk caches, gives users the appearance of massive disk capacity at a fraction of the cost of storing the same data on spinning disks.

Note: At IU, the initials SDA, MDSS, and HPSS are often used interchangeably to describe the same service.

Although the names of files placed on the SDA remain visible to the user, the actual data migrate to tape when they haven't been accessed for a certain period of time. When data have migrated to tape, their retrieval can require up to two minutes per file as the tape robot must locate, mount, and read the appropriate tape. Due to the overhead involved in manipulating data this way, the SDA is not well suited for storing a large number of small files.

The I-Light high-performance network between IUB and IUPUI makes it possible for the SDA HPSS system to create two tape copies of user data simultaneously (one at IUB and another at IUPUI), adding a degree of disaster tolerance to both sites.

Back to top

System information

Note: The SDA is offline for regularly scheduled maintenance every Sunday 7am-10am.

System configuration
Machine type Distributed HPSS data archive
Operating system Red Hat Enterprise Linux 6
Storage information
Network file system protocols
HSI/HTAR, CIFS (Samba) as read-only, SFTP/SCP, HTTPS
Usable tape capacity 15 PB
Total disk capacity (cache) 1800 TB
Quotas
50 TB (default) per user, 50 TB (default) per project; increases as needed
Backup and purge policies
Dual copies of data, but no backups; system is never purged
Aggregate I/O 80 Gbps

Back to top

Working with ePHI research data

The Health Insurance Portability and Accountability Act of 1996 (HIPAA) established rules protecting the privacy and security of personal health data. The HIPAA Security Rule set national standards specifically for the security of protected health information (PHI) that is created, stored, transmitted, or received electronically (i.e., electronic protected health information, or ePHI). To ensure the confidentiality, integrity, and availability of ePHI data, the HIPAA Security Rule requires organizations and individuals to implement a series of administrative, physical, and technical safeguards when working with ePHI data.

Although you can use this system for processing or storing electronic protected health information (ePHI) related to official IU research:

  • You and/or the project's principal investigator (PI) are responsible for ensuring the privacy and security of that data, and complying with applicable federal and state laws/regulations and institutional policies. IU's policies regarding HIPAA compliance require the appropriate Institutional Review Board (IRB) approvals and a data management plan.
  • You and/or the project's PI are responsible for implementing HIPAA-required administrative, physical, and technical safeguards to any person, process, application, or service used to collect, process, manage, analyze, or store ePHI data.

The UITS Advanced Biomedical IT Core provides consulting and online help for Indiana University researchers who need help securely processing, storing, and sharing ePHI research data. If you need help or have questions about managing HIPAA-regulated data at IU, contact the ABITC. For additional details about HIPAA compliance at IU, see HIPAA & ABITC and the Office of Vice President and General Counsel (OVPGC) HIPAA Privacy & Security page.

Important: Although UITS HIPAA-aligned resources are managed using standards surpassing official standards for managing institutional data at IU and are appropriate for storing HIPAA-regulated ePHI research data, they are not recognized by the IU Committee of Data Stewards as appropriate for storing institutional data elements classified as Critical that are not ePHI data. For help determining which institutional data elements classified as Critical are considered ePHI, see Which data elements in the classifications of institutional data are considered protected health information (PHI)?

The IU Committee of Data Stewards and the University Information Policy Office (UIPO) set official classification levels and data management standards for institutional data in accordance with the university's Management of Institutional Data (DM-01) policy. If you have questions about the classifications of institutional data, contact the appropriate Data Steward. To determine the most sensitive classification of institutional data you can store on any given UITS service, see the "Choosing an appropriate storage solution" section of At IU, which dedicated file storage services and IT services with storage components are appropriate for sensitive institutional data, including ePHI research data?

Back to top

Requesting an account

To request an account on the Scholarly Data Archive (SDA) or Research File System (RFS), use the Account Management Service (AMS):

After submitting your account request, UITS will notify you via email when your account is ready for use.

For eligibility requirements, see the "Research system accounts (all campuses)" section in What computing accounts are available at IU, and for whom?

Back to top

Transferring files

Once you have an SDA account, you can access it from any networked host. The method you use depends on your operating system and level of comfort with the command-line interface.

Methods available for transferring data to and from the Indiana University Scholarly Data Archive (SDA) include Hierarchical Storage Interface (HSI), secure FTP (SFTP), secure copy (SCP), and https (via a web browser). For instructions, see:

Read-only access is available via CIFS/Samba; see At IU, how do I access the SDA via Samba?

HSI, the highest performing non-grid method, provides shell-like facilities for recursive operations, and can take input data from standard input. HSI also can perform file migration to tape, stage files from tape to disk, and purge files from the disk cache. HSI is available on the UITS research computing systems when you load the hpss module; for more about Modules, see On Big Red II, Karst, Mason, and Rockhopper at IU, how do I use Modules to manage my software environment? Accessing the SDA via HSI from a personal workstation requires installing a special client; updated HSI clients for Linux, OS X, or Windows are available for download from the UITS Research Storage HSI page.

For Windows or OS X users who prefer a graphical interface, UITS recommends using a graphical SFTP client. For OS X users, UITS recommends Fetch, especially if you intend to transfer large amounts of data.

Note: For reasons of code compatibility with future versions of HPSS, UITS introduced a redesigned version of the Scholarly Data Archive (SDA) web interface, which entered production January 3, 2015.

The new web interface functions much like the previous version, with a few exceptions:

If you have questions about the new SDA web interface, contact the UITS Research Storage team.

Back to top

Reference

See On the Scholarly Data Archive at IU, what are classes of service, and how do I use them?

Back to top

Acknowledging grant support

The Indiana University cyberinfrastructure managed by the Research Technologies division of UITS is supported by funding from several grants, each of which requires you to acknowledge its support in all presentations and published works stemming from research it has helped to fund. Conscientious acknowledgment of support from past grants also enhances the chances of IU's research community securing funding from grants in the future. For the acknowledgment statement(s) required for scholarly printed works, web pages, talks, online publications, and other presentations that make use of this and/or other grant-funded systems at IU, see If I use IU's research cyberinfrastructure, what sources of funding do I need to acknowledge in my published work?

Back to top

Support

The SDA is maintained by the Research Storage team. If you have questions or need help, email Research Storage.

Back to top

This is document aiyi in the Knowledge Base.
Last modified on 2015-02-27.

  • Fill out this form to submit your issue to the UITS Support Center.
  • Please note that you must be affiliated with Indiana University to receive support.
  • All fields are required.

Please provide your IU email address. If you currently have a problem receiving email at your IU account, enter an alternate email address.

  • Fill out this form to submit your comment to the IU Knowledge Base.
  • If you are affiliated with Indiana University and need help with a computing problem, please use the I need help with a computing problem section above, or contact your campus Support Center.

Please provide your IU email address. If you currently have a problem receiving email at your IU account, enter an alternate email address.