About the Slate-Scratch high performance file system for research computation at IU

On this page:


Overview

Slate-Scratch is a large-capacity, high-throughput, high-bandwidth Lustre-based file system designed for the temporary storage of computational data to meet the needs of data-intensive workflows and analytics running on Indiana University's research supercomputers.

Slate-Scratch directories are created automatically for all users with accounts on IU's research supercomputers. If you have an account on an IU research supercomputer, your Slate-Scratch directory is mounted at /N/scratch/username (replace username with your IU username).

Each IU research supercomputer user is allowed to store up to 100 TB of data on Slate-Scratch. An inode quota limits the number of files and directories a single user can create to 10 million.

Space on Slate-Scratch is not intended for permanent storage. Files in scratch space will be purged if they have not been accessed for more than 30 days. Users are responsible for archiving their data. To archive scratch space data, move files to the Scholarly Data Archive (SDA); see Access the SDA at IU.

Data stored on Slate-Scratch are not backed up in any way, shape, or form by Research Technologies or any other entity in UITS.

For information about the appropriate use of Slate-Scratch, including details on the data purge policy, see Slate-Scratch high performance file system: Terms of service.

Important:

Before storing data on any of Indiana University's research computing or storage systems, make sure you understand the information in Types of sensitive institutional data appropriate for UITS Research Technologies services.

Make sure you do not include sensitive institutional data as part of a file's filename or pathname.

Work with files on Slate-Scratch

Check usage

To check your inode usage (the number of items stored in your Slate-Scratch space), enter (replace username with your IU username):

lfs quota -h -u username /N/scratch

To check the size of a file stored on Slate-Scratch, enter (replace username with your IU username and filename with the name of your file):

ls -sh /N/scratch/username/filename

List files

You should use ls -l only when necessary, or only on directories containing small amounts of data. Running ls -l in directory on Slate-Scratch to list its contents and associated metadata (for example, ownership, permissions, and file size information for each file) can cause performance issues for you and other users, particularly if the directory contains a large amount of data.

Due to its parallel architecture, Lustre performs file and metadata operations separately. When you run ls -l in a Slate-Scratch directory, the system contacts Lustre's Metadata Server (MDS) to get your data's location, ownership, and permissions information. However, to retrieve file size information, the system must contact multiple Object Storage Servers (OSSs), which in turn must contact multiple Object Storage Targets (OSTs) that store the data objects that make up your files. When the load on one or more OSS nodes is high, your ls -l command may hang; other users on the file system may experience latency issues, as well.

Furthermore, some IU systems have ls (without any options) aliased to ls --color=tty, which enables the use of colors to distinguish file types. With the alias, running ls initiates a full lookup to determine the color associated with each file, which (as with ls -l) requires communication with the OSSs and the OSTs. Without the alias, running ls contacts the MDS only (it does not initiate a full lookup involving the OSSs and OSTs). To avoid potential performance issues, you can override the ls --color=tty alias, preventing ls from initiating a full lookup. To do so, add the following line to your shell profile:

unalias ls

Using ls to list information about individual files creates a lot less overhead on the file system:

  • To check for the existence of a file (for example, my_file), use:
    ls my_file
  • To see all details for a specific file (for example, my_file), use:
    ls -l my_file

Sort files by age

To determine which files located in or below the present working directory are the oldest (and at risk of being purged), you can list them by age (oldest to newest) using the find command; for example:

find . -type f -exec ls -1hltr '{}' +;

In the command above:

  • The dot (.) directs find to search the present working directory and its subdirectories.
  • The -type f test limits the find search to regular files.
  • The -exec ls -1hltr "{}" +; action makes find run the ls command on its search results and treat any subsequent arguments as options to that command until it encounters the semicolon (;) argument.
  • The + directive builds a file list from the find search results, appending each file name to the {} string.
  • The ls command parses the file list and (given the options provided) displays the results one file per line (-1), in long format (-l), with human-readable file sizes (-h), sorted by modification time (-t), and listed in reverse order (-r).

To perform the same operation on a directory that's not the present working directory, use the same command and options, but replace the dot (.) with the full path to the directory in question; for example (replace username with your IU username and some_other_dir with the directory you want to sort):

find /N/scratch/username/some_other_dir -type f -exec ls -1hltr "{}" +;

Transfer files

The Slate-Scratch file system is a parallel high performance file systems. Files are not "transferred" to the file system; instead the Slate-Scratch file system is mounted on IU's research supercomputers, making it accessible from those resources as a directory path (for example, /N/scratch/username). To read or write a file on the Slate-Scratch file system, use the same standard Linux commands used for reading and writing files stored in your IU research supercomputer home directory.

Note:
The Slate-Scratch file system is not designed for storing a large number of small files. If you need to store a large number of small files, use a compression utility, such as tar or gzip, to bundle them into a small number of large files. Failure to do so can negatively impact performance of the file system and strain its file-count (inode) capacity.

Work with PHI

This UITS system or service meets certain requirements established in the HIPAA Security Rule thereby enabling its use for work involving data that contain protected health information (PHI). However, using this system or service does not fulfill your legal responsibilities for protecting the privacy and security of data that contain PHI. You may use this system or service for work involving data that contain PHI only if you institute additional administrative, physical, and technical safeguards that complement those UITS already has in place.

If you have questions about securing HIPAA-regulated research data at IU, email securemyresearch@iu.edu. SecureMyResearch provides self-service resources and one-on-one consulting to help IU researchers, faculty, and staff meet cybersecurity and compliance requirements for processing, storing, and sharing regulated and unregulated research data; for more, see About SecureMyResearch. To learn more about properly ensuring the safe handling of PHI on UITS systems, see the UITS IT Training video Securing HIPAA Workflows on UITS Systems. To learn about division of responsibilities for securing PHI, see Shared responsibility model for securing PHI on UITS systems.

Get help

Slate-Scratch is managed by the UITS Research Technologies High Performance File Systems (HPFS) team. If you need help or have questions, contact HPFS using the Research Technologies contact form; from the "Choose an area to direct your question to" drop-down, select High performance storage.

This is document bgtr in the Knowledge Base.
Last modified on 2024-01-25 09:50:47.