Indiana University
University Information Technology Services
  
What are archived documents?
Login>>
Login

Login is for authorized groups (e.g., UITS, OVPIT, and TCC) that need access to specialized Knowledge Base documents. Otherwise, simply use the Knowledge Base without logging in.

Close

The Data Capacitor II and DCWAN high-speed file systems at Indiana University

Note: The Data Capacitor II (DC2) high-speed, high-capacity, storage facility for very large data sets replaces the former Data Capacitor file system, which was decommissioned January 7, 2014. The DC2 scratch file system (/N/dc2/scratch) is mounted on Big Red II, Quarry, and Mason. Project directories on the former Data Capacitor were migrated to DC2 by UITS before the system was decommissioned. All data on the Data Capacitor scratch file system (/N/dc/scratch) were deleted when the system was decommissioned. If you have questions about the Data Capacitor's retirement, email the UITS High Performance File Systems group.

On this page:


System overview

The High Performance File Systems unit of UITS Research Technologies operates two separate high-speed file systems for temporary storage of research data. Both use the open source Lustre parallel distributed file system running on a version of the Linux operating system:

  • Data Capacitor II: Data Capacitor II (DC2) is a larger, faster replacement for the former Data Capacitor, which was decommissioned January 7, 2014. Like its predecessor, DC2 is a large-capacity, high-throughput, high-bandwidth Lustre-based file system serving all IU campuses. It is mounted on the Big Red II, Quarry, and Mason research computing systems.

  • Data Capacitor wide-area network: The Data Capacitor wide-area network (DCWAN) is a large, high-speed data storage facility serving all IU campuses and several research centers throughout the nation, including Extreme Science and Engineering Discovery Environment (XSEDE) service providers. The DCWAN file system lets researchers access remote data as if that data were stored locally, making it easy to share large amounts of data with researchers at multiple remote sites.

Note: These file systems are not intended for permanent storage of data and are not backed up. Files in scratch directories may be purged if they have not been accessed for more than 60 days. Files in project directories may be purged if they have not been accessed for more than 180 days.

Back to top

System information

Note: The scheduled monthly maintenance window for the Data Capacitor II (DC2) and Data Capacitor wide-area network (DCWAN) file systems is the first Tuesday of each month, 7am-7pm.

Data Capacitor II (DC2)


System configuration
Machine type Data storage
Operating system Linux CentOS release 6.2, Kernel 2.6.32
Processor cores 4-6
CPUs 2 per node
Nodes 26
RAM 48-96 GB DDR-2
Network 56-Gb FDR InfiniBand
Storage Connected via 56-Gb FDR InfiniBand to DataDirect Network SFA12K storage controllers
Storage information
File systems Lustre 2.1.6
Total disk space 3.5 PB
Total scratch space Varies based on system usage
Aggregate I/O 40 GBps
Availability scope All IU campuses
Quotas 10 TB default; more upon request

Back to top

Data Capacitor wide-area network (DCWAN)


System configuration
Machine type Data storage
Operating system Red Hat Enterprise Linux v5.8, Kernel 2.6.18
Processor cores 4
CPUs 2 per node
Nodes 6
RAM 32-188 GB DDR-2
Network 10-Gb Ethernet
Storage Connected via 4-Gb fiber channel to DataDirect Network S2A9550 storage controllers    
Storage information
File system Lustre 1.8.1.1 patched with IU's UID/GID mapping code on metadata servers
Lustre 1.8.5 on object storage servers
Total disk space 339 TB
Total scratch space Varies based on system usage
Aggregate I/O 40 Gbps
Availability scope All IU campuses and other US sites
Available to XSEDE researchers
Quotas 10 TB default; more upon request

Back to top

System access

  • Data Capacitor II: The DC2 file system is mounted on Big Red II, Quarry, and Mason as /N/dc2/, and behaves like any other disk device on those machines. If you have an account on Big Red II, Quarry, or Mason, you can access your DC2 scratch directory at (replace username with your IU Network ID username): /N/dc2/scratch/username
  • DCWAN: Users at other institutions (including IU researchers with accounts on remote systems) can request DCWAN storage space, which can be mounted on remote systems, as well as on Big Red II, Quarry, and Mason, as /N/dcwan/ .

Back to top

Listing files

You should use ls -l only when necessary, or only on directories containing small amounts of data. Running  ls -l  in a DC2 or DCWAN directory to list its contents and associated metadata (e.g., ownership, permissions, and file size information for each file) can cause performance issues for you and other users, particularly if the directory contains a large amount of data.

Due to its parallel architecture, Lustre performs file and metadata operations separately. When you run  ls -l  in a DC2 or DCWAN directory, the system contacts Lustre's Metadata Server (MDS) to get your data's location, ownership, and permissions information. However, to retrieve file size information, the system must contact multiple Object Storage Servers (OSSs), which in turn must contact multiple Object Storage Targets (OSTs) that store the data objects that make up your files. When the load on one or more OSS nodes is high, your  ls -l  command may hang; other users on the file system may experience latency issues, as well.

Furthermore, some IU systems have  ls  (without any options) aliased to  ls --color=tty , which enables the use of colors to distinguish file types. With the alias, running  ls  initiates a full lookup to determine the color associated with each file, which (as with  ls -l ) requires communication with the OSSs and the OSTs. Without the alias, running  ls  contacts the MDS only (i.e., it does not initiate a full lookup involving the OSSs and OSTs). To avoid potential performance issues, you can override the  ls --color=tty  alias, preventing  ls  from initiating a full lookup. To do so, add the following line to your shell profile:

unalias ls

Using  ls  to list information about individual files creates a lot less overhead on the file system:

  • To check for the existence of a file (e.g., my_file), use: ls my_file
  • To see all details for a specific file (e.g., my_file), use: ls -l my_file

Back to top

Sorting files by age

Data Capacitor II and DCWAN are intended for temporary storage of research computing data. Files in scratch directories may be purged if they have not been accessed for more than 60 days. Files in project directories may be purged if they have not been accessed for more than 180 days.

To determine which files are the oldest (and at risk of being purged), you can list them by age (oldest to newest) using this command: find . -type f -exec ls -1hltr "{}" +;

Back to top

Transferring files

The Data Capacitor II and DCWAN file systems are parallel high-performance file systems. Files are not "transferred" to these file system; instead the DC2 or DCWAN file systems are mounted on computational resources, making them accessible from those resource as directory paths (e.g., /N/dc2/scratch/username). To read or write a file on the DC2 or DCWAN file system, use the same standard Unix commands used for reading and writing files stored on your computational system's local directories.

Back to top

Specifying DC2 or DCWAN as a requirement for your batch job

The Data Capacitor II and DCWAN parallel file systems are mounted on Big Red II, Quarry, and Mason. You can specify either file system as a requirement for batch jobs running on those systems; for instructions, see For batch jobs on Big Red II, Quarry, or Mason at IU, how do I specify the required parallel file systems?

Back to top

Usage policies

  • The Data Capacitor II and DCWAN file systems both provide two types of storage space:

    • Scratch space: The DC2 and DCWAN scratch directories are temporary workspaces available to all users on Big Red II, Quarry, and Mason. Scratch space is not allocated, and its total capacity fluctuates based on project space requirements. Files in scratch space may be purged if they have not been accessed for more than 60 days.

      Your personal scratch directory is automatically created with your user account. It is available at these locations, depending on the file system (replace username with your IU Network ID username):

      /N/dc2/scratch/username /N/dcwan/scratch/username
    • Project space: DC2 and DCWAN project space is dedicated to long-term projects with storage and access requirements that cannot be met by other systems. Requests for project space must be submitted to the High Performance File System team for evaluation by the allocation committee. Files in project space may be purged if they have not been accessed for more than 180 days.

      To request DC2 or DCWAN project space, fill out and submit the Project Allocation Request Form.

  • File system space not allocated to projects will be available as scratch space and will vary depending on file system usage.

  • Projects receive a default quota of 10 TB. Project owners can request quota increases if additional space is needed. Due to performance issues, storing a large number of small files is discouraged, but arrangements can be made if a need exists.

  • The DC2 and DCWAN file systems are not intended for permanent storage of data and are not backed up. It is your responsibility to arrange any long-term storage required. At IU, to archive data stored or created on the DC2 or DCWAN file system, move them to IU's Scholarly Data Archive (SDA).

  • Lustre is not designed for storing a large number of small files. If you need such storage, you should use a compression utility (e.g., tar or gzip) to bundle your files into a small number of large files. Failure to do so can negatively impact performance of these file systems and strain their file-count (inode) capacities.

Back to top

Working with electronic protected health information

Your responsibilities

Important: Storing electronic protected health information (ePHI) regulated by the Health Insurance Portability and Accountability Act of 1996 (HIPAA) on the DCWAN file system is not permitted.

If you use the Data Capacitor II file system to store ePHI data:

  • You and/or the project's principal investigator (PI) are responsible for ensuring the privacy and security of that data, and complying with applicable federal and state laws/regulations and institutional policies. IU's policies regarding HIPAA compliance require the appropriate Institutional Review Board (IRB) approvals and a data management plan.

  • You and/or the project's PI are responsible for implementing HIPAA-required administrative, physical, and technical safeguards to any person, process, application, or service used to collect, process, manage, analyze, or store ePHI data.

For more, see What are my responsibilities when using UITS systems for work with electronic protected health information?

Important: Although UITS HIPAA-aligned resources are managed using standards meeting or exceeding those established for managing institutional data at IU, and are approved by the IU Office of the Vice President and General Counsel (OVPGC) for storing research-related ePHI, they are not recognized by the IU Committee of Data Stewards as appropriate for storing other types of institutional data classified as "Critical" that are not ePHI research data. To determine which services are appropriate for storing sensitive institutional data, including ePHI research data, see Comparing supported data classifications, features, costs, and other specifications of file storage solutions and services with storage components available at IU.

The Committee of Data Stewards and the University Information Policy Office (UIPO) define official classification levels and management standards for institutional data in accordance with IU's Management of Institutional Data (DM-01) policy:

Technical safeguards

You should employ the following technical safeguards when working with ePHI:

  • Set directory permissions: The permissions for a directory containing ePHI should be set to grant read, write, and execute access to the owner (you) only. No access at all should be granted to group members and other users.

    To change the permission of an existing file or directory, use the chmod command. For example, to restrict all read and write access to the owner of ephi_file, on the command line, enter:

    chmod 700 ephi_file

    The above command will set the Unix permissions to look like this:

    -rwx------ 1 <username> uits 40 Sep 13 15:12 ephi_file

    Alternatively, to configure your user environment so that every new file and directory gets the same permission level (accessible only by the owner), add the following line to your shell profile:

    umask 077
  • Encrypt data at rest: While files containing ePHI data are at rest (i.e., when you are not working with them), they should be encrypted; see Recommended encryption tools for handling ePHI at IU.

Back to top

Reference

For more about the Lustre file system, see the Lustre wiki.

Back to top

Acknowledging grant support

The Indiana University cyberinfrastructure managed by the Research Technologies division of UITS is supported by funding from several grants, each of which requires you to acknowledge its support in all presentations and published works stemming from research it has helped to fund. Conscientious acknowledgment of support from past grants also enhances the chances of IU's research community securing funding from grants in the future. For the acknowledgment statement(s) required for scholarly printed works, web pages, talks, online publications, and other presentations that make use of this and/or other grant-funded systems at IU, see If I use IU's research cyberinfrastructure, what sources of funding do I need to acknowledge in my published work?

Back to top

Support

For technical support or general information about the DC2 and DCWAN file systems, email the UITS High Performance File Systems group.

For after-hours support, call Data Center Operations (812-855-9910), and ask to have High Performance File Systems contacted.

To receive maintenance and downtime information, subscribe to the hpfs-maintenance-l@indiana.edu mailing list; see On IU List, how do I subscribe to a list?

Back to top

This is document avvh in domain all.
Last modified on April 02, 2014.

I need help with a computing problem

  • Fill out this form to submit your issue to the UITS Support Center.
  • Please note that you must be affiliated with Indiana University to receive support.
  • All fields are required.



Please provide your IU email address. If you currently have a problem receiving email at your IU account, enter an alternate email address.

I have a comment for the Knowledge Base

  • Fill out this form to submit your comment to the IU Knowledge Base.
  • If you are affiliated with Indiana University and need help with a computing problem, please use the I need help with a computing problem section above, or contact your campus Support Center.