About the Data Capacitor Wide Area Network 2 (DC-WAN2) high-speed file system at IU

The Data Capacitor Wide Area Network 2 (DC-WAN2) is a large, high-speed data storage facility serving all Indiana University campuses and several research centers throughout the nation. The DC-WAN2 file system lets researchers access remote data as if they were stored locally and share large amounts of data with researchers at multiple remote sites.

The DC-WAN2 file system is operated by the High Performance File Systems (HPFS) unit of UITS Research Technologies.

Usage policies

The DC-WAN2 file system is not designed for storing a large number of small files. If you need to store a large number of small files, use use tar and gzip to bundle them into a compressed archive file. Failure to do so can negatively impact DC-WAN2 performance and strain its file-count (inode) capacities.

Project directories on the DC-WAN2 file system are reserved for research projects with atypical requirements that cannot be met by other systems. Project spaces are not intended for permanent storage; data are not backed up. Files in project space may be purged if they have not been accessed for more than 180 days. To archive project space data at IU, move files to the Scholarly Data Archive (SDA).

Storing data containing protected health information (PHI) regulated by the Health Insurance Portability and Accountability Act of 1996 (HIPAA) on the DC-WAN2 file system is not permitted.

System information

The scheduled monthly maintenance window for IU's high performance computing systems is the second Sunday of each month, 7am-7pm.

System configuration
Machine type Data storage
Operating system CentOS Linux v6.8, Kernel 2.6.32
Processor cores 10
CPUs 2 per node
Nodes 6
RAM 256 GB DDR-4
Network 10/40-Gbs Ethernet
Storage Connected via 40-Gbs QDR Infiniband
Storage information
File system Lustre 2.8.0
Total disk space 651 TB
Aggregate I/O 40 Gbps to storage servers; 10 Gbps to metadata servers (1 active / 1 backup)
Availability scope All IU campuses and other US sites

System access

The DC-WAN2 file system is mounted on the IU research supercomputers as /N/dcwan/ and behaves like any other disk device on those machines. If you have DC-WAN2 project space, you can access it from the research supercomputers at /N/dcwan/projects/project_name (replace project_name with the name of your project).

List files

You should use ls -l only when necessary, or only on directories containing small amounts of data. Running ls -l in a DC-WAN2 directory to list its contents and associated metadata (for example, ownership, permissions, and file size information for each file) can cause performance issues for you and other users, particularly if the directory contains a large amount of data.

Due to its parallel architecture, Lustre performs file and metadata operations separately. When you run ls -l in a DC-WAN2 directory, the system contacts Lustre's Metadata Server (MDS) to get your data's location, ownership, and permissions information. However, to retrieve file size information, the system must contact multiple Object Storage Servers (OSSs), which in turn must contact multiple Object Storage Targets (OSTs) that store the data objects that make up your files. When the load on one or more OSS nodes is high, your ls -l command may hang; other users on the file system may experience latency issues, as well.

Furthermore, some IU systems have ls (without any options) aliased to ls --color=tty, which enables the use of colors to distinguish file types. With the alias, running ls initiates a full lookup to determine the color associated with each file, which (as with ls -l) requires communication with the OSSs and the OSTs. Without the alias, running ls contacts the MDS only (it does not initiate a full lookup involving the OSSs and OSTs). To avoid potential performance issues, you can override the ls --color=tty alias, preventing ls from initiating a full lookup. To do so, add the following line to your shell profile:

unalias ls

Using ls to list information about individual files creates a lot less overhead on the file system:

  • To check for the existence of a file (for example, my_file), use:
    ls my_file
  • To see all details for a specific file (for example, my_file), use:
    ls -l my_file

Sort files by age

DC-WAN2 is intended for temporary storage of research computing data. Files in project directories may be purged if they have not been accessed for more than 180 days.

To determine which files located in or below the present working directory are the oldest (and at risk of being purged), you can list them by age (oldest to newest) using the find command; for example:

find . -type f -exec ls -1hltr '{}' +;

In the command above:

  • The dot (.) directs find to search the present working directory and its subdirectories.
  • The -type f test limits the find search to regular files.
  • The -exec ls -1hltr "{}" +; action makes find run the ls command on its search results and treat any subsequent arguments as options to that command until it encounters the semicolon (;) argument.
  • The + directive builds a file list from the find search results, appending each file name to the {} string.
  • The ls command parses the file list and (given the options provided) displays the results one file per line (-1), in long format (-l), with human-readable file sizes (-h), sorted by modification time (-t), and listed in reverse order (-r).

To perform the same operation on a directory that's not the present working directory, use the same command and options, but replace the dot (.) with the full path to the directory in question; for example:

  • For a directory in your project space, (replace project_name with the your project's name and some_other_dir with the directory you want to sort):
    find /N/dcwan/projects/project_name/some_other_dir -type f -exec ls -1hltr "{}" +;

Transfer files

The DC-WAN2 file system is a parallel high performance file system. Files are not "transferred" to the DC-WAN2 file system; instead the DC-WAN2 file system is mounted on computational resources, making it accessible from those resource as directory paths (for example, /N/dcwan/projects/project_name). To copy or move files on the DC-WAN2 file system, use the same standard Linux command used for copying or moving files stored on your computational system's local directories.

Acknowledge grant support

The Indiana University cyberinfrastructure, managed by the Research Technologies division of UITS, is supported by funding from several grants, each of which requires you to acknowledge its support in all presentations and published works stemming from research it has helped to fund. Conscientious acknowledgment of support from past grants also enhances the chances of IU's research community securing funding from grants in the future. For the acknowledgment statement(s) required for scholarly printed works, web pages, talks, online publications, and other presentations that make use of this and/or other grant-funded systems at IU, see Sources of funding to acknowledge in published work if you use IU's research cyberinfrastructure


For more about the Lustre file system, see the Lustre wiki.

For technical support or general information about the Slate, Slate-Project, Slate-Scratch, or Data Capacitor Wide Area Network 2 (DC-WAN2) file system, contact the UITS High Performance File Systems (HPFS) group.

To receive maintenance and downtime information, subscribe to the hpfs-maintenance-l@indiana.edu mailing list; see Subscribe to an IU List mailing list.

This is document bgqk in the Knowledge Base.
Last modified on 2023-02-27 16:40:42.