About the Scholarly Data Archive (SDA) at Indiana University
On this page:
- System overview
- System information
- Work with data containing PHI
- Request an account
- Access the SDA and transfer files
- Acknowledge grant support
- Support
System overview
The Indiana University Scholarly Data Archive (SDA) provides extensive capacity (approximately 79 PB of tape overall) for storing and accessing research data. The SDA is a distributed storage service co-located at IU data centers in Bloomington and Indianapolis, providing IU researchers with large-scale archival or near-line data storage, arranged in large files, with two copies of data made by default (for disaster recovery).
The SDA is based on the High Performance Storage System (HPSS), a consortium-developed hierarchical storage management (HSM) package that makes the SDA's hierarchy of storage media transparent to its users. The SDA's system architecture comprises fast, efficient disk cache front-end components that move infrequently accessed data to two high-end tape libraries. Using the I-Light high performance network between IUB and IUPUI, the SDA creates two tape copies of user data simultaneously (one at each data center), adding a degree of disaster tolerance to both sites.
The SDA is well suited for storing large volumes of data (that is, tens of gigabytes to several terabytes per project), and data that are accessed relatively infrequently (archival or near-line storage). The SDA backend is not designed for storing a large number of small files. Individual files should be at least 1 MB. If you need to store many small files on the SDA, use a file compression utility (GZip, tar, or ZIP) to bundle your files into a single, large archive file.
The SDA supports high performance access methods, such as the Hierarchical Storage Interface (HSI); an HPSS API is available for programmers, as well.
- File and directory names within the SDA may only contain ASCII characters in the range 0x20 to 0x7e. SDA does not support Unicode and similar encodings; rarely, you may need to rename external files before transferring them into the SDA.
- At IU, the initials SDA, MDSS, and HPSS are often used interchangeably to describe the same service.
System information
- Machine type: Distributed HPSS data archive
- Operating system: Red Hat Enterprise Linux 7
- Potential tape library capacity: 79 PB
- Total disk capacity (cache): 1,800 TB
- Aggregate I/O: 80 Gbps
- Backup and purge policies: Data are not backed up; the system is never purged as long as the account owner has a valid IU account.
Note:
Due to the mammoth volume of data stored on the Scholarly Data Archive (SDA), back-ups are neither practical nor economical. The SDA doesn't have an offline, traditional backup system, so any deletion you perform will cause an irreversible loss of data.
However, to protect against random tape errors, two tape copies of the data are created by default. If one tape fails, data can be retrieved from the other tape. The two copies of data reside at two geographically distant sites (at IU Bloomington and at IUPUI) in two separate tape libraries. Also, two separate metadata backups are performed, at IUB and IUPUI. As a result, even in the event of a catastrophic disaster affecting either the IUB or IUPUI site (such as a fire or a tornado), all dual-copy data on the SDA would still be safe.
- Quotas: 50 TB (default) per user or project; when in support of research activities, extensions beyond 50 TB may be granted for a nominal charge. If you need more than 50 TB, submit your request using the SDA Quota Increase Request form.
On October 6, 2019, UITS Research Technologies implemented a file quota to limit the number of files users can store in their Scholarly Data Archive accounts. The file quota for new accounts is 25,000 files. To view your SDA quotas and usage, log into HPC everywhere.
- Directory path and file name limits: On the SDA, HPSS limits the length of directory paths and file names as follows:
- Directory paths: The directory path of any file may not exceed 1,024 characters in length.
- File names: File names may not exceed 256 characters in length.
Note:File name and directory path limits in HPSS are separate from (and less restrictive than) similar limits imposed when using HTAR; for more, see HTAR limitations in Use HTAR with your SDA account.
Work with data containing PHI
The Health Insurance Portability and Accountability Act of 1996 (HIPAA) established rules protecting the privacy and security of individually identifiable health information. The HIPAA Privacy Rule and Security Rule set national standards requiring organizations and individuals to implement certain administrative, physical, and technical safeguards to maintain the confidentiality, integrity, and availability of protected health information (PHI).
This UITS system or service meets certain requirements established in the HIPAA Security Rule thereby enabling its use for work involving data that contain protected health information (PHI). However, using this system or service does not fulfill your legal responsibilities for protecting the privacy and security of data that contain PHI. You may use this system or service for work involving data that contain PHI only if you institute additional administrative, physical, and technical safeguards that complement those UITS already has in place.
If you have questions about securing HIPAA-regulated research data at IU, email securemyresearch@iu.edu
. SecureMyResearch provides self-service resources and one-on-one consulting to help IU researchers, faculty, and staff meet cybersecurity and compliance requirements for processing, storing, and sharing regulated and unregulated research data; for more, see About SecureMyResearch. To learn more about properly ensuring the safe handling of PHI on UITS systems, see the UITS IT Training video Securing HIPAA Workflows on UITS Systems. To learn about division of responsibilities for securing PHI, see Shared responsibility model for securing PHI on UITS systems.
Request an account
For eligibility requirements, see the "Research system accounts (all campuses)" section in Computing accounts at IU.
- To request an individual (personal) account on the Indiana University Scholarly Data Archive (SDA), follow the instructions in Get additional IU computing accounts.
If you are eligible to request an SDA account, but the SDA is not listed among the accounts available for you to request, contact your campus Support Center for help.
- To request an SDA account for your IU group account, use the SDA Group Account Request form.
Note:In accordance with standards for access control mandated by the HIPAA Security Rule, you are not permitted to access data containing protected health information (PHI) using a group (or departmental) account. To ensure accountability and maintain appropriate levels of access control, all users must use an individual login for all work involving PHI.
After submitting your account request, UITS will notify you via email when your account is ready for use.
Access the SDA and transfer files
Once you have an SDA account, you can access it from any networked host. The method you use depends on your operating system and level of comfort with the command-line interface.
- The SDA is offline for regularly scheduled maintenance every Sunday 7am-10am.
- To access the SDA from off campus, UITS recommends protocols such as Globus and SFTP, which provide IU Login and Two-Step Login (Duo) authentication, as well as encryption in transit. Because the HSI/HTAR protocols don't provide the same protections, you will need apply for an off-campus exemption to remotely access the SDA with a local HSI/HTAR client.
Methods available for transferring data to and from the Indiana University Scholarly Data Archive (SDA) include secure FTP (SFTP), secure copy (SCP), GridFTP (via the IU Globus Web App), and Hierarchical Storage Interface (HSI). For instructions, see:
- Use SFTP or SCP to access your SDA account at IU
- Use the IU Globus Web App to transfer data to and from your accounts on IU's research computing and storage systems
- Use HSI to access your SDA account at IU
HSI, the highest performing non-grid method, provides shell-like facilities for recursive operations, and can take input data from standard input. HSI also can perform file migration to tape, stage files from tape to disk, and purge files from the disk cache. HSI is available on UITS research supercomputers when you load the hpss
module. For more about HSI, see the HSI Reference Manual.
For use on personal workstations, IU SDA users can download and install HSI version 8.3.3 (bundled with its companion program, HTAR) from the UITS Research Technologies HSI folder in Google at IU My Drive. (You must be signed into your Google at IU account to access this folder; see Access Google at IU.) Bundles are available for 32- and 64-bit Windows, macOS, and Red Hat Enterprise Linux, and for 64-bit Ubuntu Linux.
- To connect to the SDA with a local HSI/HTAR client, make sure you have version 8.3.3 installed.
- An SDA Remote Access Exemption is required to connect to the SDA with a local HSI/HTAR client from an off-campus network location.
For Windows or macOS users who prefer a graphical interface, UITS recommends using a graphical SFTP client. For macOS users, particularly those needing to transfer large amounts of data, UITS recommends Fetch.
Access the SDA in Research Desktop (RED)
You can access the SDA in Research Desktop (RED) from the ThinLinc Client by clicking
.Acknowledge grant support
The Indiana University cyberinfrastructure, managed by the Research Technologies division of UITS, is supported by funding from several grants, each of which requires you to acknowledge its support in all presentations and published works stemming from research it has helped to fund. Conscientious acknowledgment of support from past grants also enhances the chances of IU's research community securing funding from grants in the future. For the acknowledgment statement(s) required for scholarly printed works, web pages, talks, online publications, and other presentations that make use of this and/or other grant-funded systems at IU, see Sources of funding to acknowledge in published work if you use IU's research cyberinfrastructure
Support
The SDA is maintained by the UITS Research Storage team. If you have questions or need help, contact UITS Research Storage.
This is document aiyi in the Knowledge Base.
Last modified on 2023-04-24 14:24:44.