About the Data Capacitor Wide Area Network 2 (DC-WAN2) high-speed file system at IU
On this page:
- System overview
- Usage policies
- System information
- System access
- List files
- Sort files by age
- Transfer files
- Acknowledge grant support
- Support
System overview
The Data Capacitor Wide Area Network 2 (DC-WAN2) is a large, high-speed data storage facility serving all Indiana University campuses and several research centers throughout the nation. The DC-WAN2 file system lets researchers access remote data as if they were stored locally and share large amounts of data with researchers at multiple remote sites.
The DC-WAN2 file system is operated by the High Performance File Systems (HPFS) unit of UITS Research Technologies.
Usage policies
The DC-WAN2 file system is not designed for storing a large number of small files. If you need to store a large number of small files, use use tar and gzip to bundle them into a compressed archive file. Failure to do so can negatively impact DC-WAN2 performance and strain its file-count (inode) capacities.
Project directories on the DC-WAN2 file system are reserved for research projects with atypical requirements that cannot be met by other systems. Project spaces are not intended for permanent storage; data are not backed up. Files in project space may be purged if they have not been accessed for more than 180 days. To archive project space data at IU, move files to the Scholarly Data Archive (SDA).
System information
The scheduled monthly maintenance window for IU's high performance computing systems is the second Sunday of each month, 7am-7pm.
System configuration | |
---|---|
Machine type | Data storage |
Operating system | CentOS Linux v6.8, Kernel 2.6.32 |
Processor cores | 10 |
CPUs | 2 per node |
Nodes | 6 |
RAM | 256 GB DDR-4 |
Network | 10/40-Gbs Ethernet |
Storage | Connected via 40-Gbs QDR Infiniband |
Storage information | |
File system | Lustre 2.8.0 |
Total disk space | 651 TB |
Aggregate I/O | 40 Gbps to storage servers; 10 Gbps to metadata servers (1 active / 1 backup) |
Availability scope | All IU campuses and other US sites |
System access
The DC-WAN2 file system is mounted on the IU research supercomputers as /N/dcwan/
and behaves like any other disk device on those machines. If you have DC-WAN2 project space, you can access it from the research supercomputers at /N/dcwan/projects/project_name
(replace project_name
with the name of your project).
List files
You should use ls -l
only when
necessary, or only on directories containing small amounts of data. Running ls -l
in a DC-WAN2 directory to list its contents and associated metadata (for example, ownership, permissions, and file size information for each file) can cause performance issues for you and other users, particularly if the directory contains a large amount of data.
Due to its parallel architecture, Lustre performs file and metadata operations separately. When you run ls -l
in a DC-WAN2 directory, the system contacts Lustre's Metadata Server (MDS) to get your data's location, ownership, and permissions information. However, to retrieve file size information, the system must contact multiple Object Storage Servers (OSSs), which in turn must contact multiple Object Storage Targets (OSTs) that store the data objects that make up your files. When the load on one or more OSS nodes is high, your ls -l
command may hang; other users on the file system may experience latency issues, as well.
Furthermore, some IU systems have ls
(without any options) aliased to ls --color=tty
, which enables the use of colors to distinguish file types. With the alias, running ls
initiates a full lookup to determine the color associated with each file, which (as with ls -l
) requires communication with the OSSs and the OSTs. Without the alias, running ls
contacts the MDS only (it does not initiate a full lookup involving the OSSs and OSTs). To avoid potential performance issues, you can override the ls --color=tty
alias, preventing ls
from initiating a full lookup. To do so, add the following line to your shell profile:
unalias ls
Using ls
to list information about individual files creates a lot less overhead on the file system:
- To check for the existence of a file (for example,
my_file
), use:ls my_file
- To see all details for a specific file (for example,
my_file
), use:ls -l my_file
Sort files by age
DC-WAN2 is intended for temporary storage of research computing data. Files in project directories may be purged if they have not been accessed for more than 180 days.
To determine which files located in or below the present working directory are the oldest (and at risk of being purged), you can list them by age (oldest to newest) using the find
command; for example:
find . -type f -exec ls -1hltr '{}' +;
In the command above:
- The dot (
.
) directsfind
to search the present working directory and its subdirectories. - The
-type f
test limits thefind
search to regular files. - The
-exec ls -1hltr "{}" +;
action makesfind
run thels
command on its search results and treat any subsequent arguments as options to that command until it encounters the semicolon (;
) argument. - The
+
directive builds a file list from thefind
search results, appending each file name to the{}
string. - The
ls
command parses the file list and (given the options provided) displays the results one file per line (-1
), in long format (-l
), with human-readable file sizes (-h
), sorted by modification time (-t
), and listed in reverse order (-r
).
To perform the same operation on a directory that's not the present working directory, use the same command and options, but replace the dot (.
) with the full path to the directory in question; for example:
- For a directory in your project space, (replace
project_name
with the your project's name andsome_other_dir
with the directory you want to sort):find /N/dcwan/projects/project_name/some_other_dir -type f -exec ls -1hltr "{}" +;
Transfer files
The DC-WAN2 file system is a parallel high performance file system. Files are not "transferred" to the DC-WAN2 file system; instead the DC-WAN2 file system is mounted on computational resources, making it accessible from those resource as directory paths (for example, /N/dcwan/projects/project_name
). To copy or move files on the DC-WAN2 file system, use the same standard Linux command used for copying or moving files stored on your computational system's local directories.
Acknowledge grant support
The Indiana University cyberinfrastructure, managed by the Research Technologies division of UITS, is supported by funding from several grants, each of which requires you to acknowledge its support in all presentations and published works stemming from research it has helped to fund. Conscientious acknowledgment of support from past grants also enhances the chances of IU's research community securing funding from grants in the future. For the acknowledgment statement(s) required for scholarly printed works, web pages, talks, online publications, and other presentations that make use of this and/or other grant-funded systems at IU, see Sources of funding to acknowledge in published work if you use IU's research cyberinfrastructure
Support
For more about the Lustre file system, see the Lustre wiki.
For technical support or general information about the Slate, Slate-Project, Slate-Scratch, or Data Capacitor Wide Area Network 2 (DC-WAN2) file system, contact the UITS High Performance File Systems (HPFS) group.
To receive maintenance and downtime information, subscribe to the hpfs-maintenance-l@indiana.edu
mailing list; see Subscribe to an IU List mailing list.
This is document bgqk in the Knowledge Base.
Last modified on 2023-02-27 16:40:42.