About the Slate-Scratch high performance file system for research computation at IU
On this page:
Overview
Slate-Scratch is a large-capacity, high-throughput, high-bandwidth Lustre-based file system designed for the temporary storage of computational data to meet the needs of data-intensive workflows and analytics running on Indiana University's research supercomputers.
Slate-Scratch directories are created automatically for all users with accounts on IU's research supercomputers. If you have an account on an IU research supercomputer, your Slate-Scratch directory is mounted at /N/scratch/username
(replace username
with your IU username).
Each IU research supercomputer user is allowed to store up to 100 TiB of data on Slate-Scratch. An inode quota limits the number of files and directories a single user can create to 10 million.
Space on Slate-Scratch is not intended for permanent storage. Files in scratch space will be purged if they have not been accessed for more than 30 days. Users are responsible for archiving their data. To archive scratch space data, move files to the Scholarly Data Archive (SDA); see Access the SDA at IU.
Data stored on Slate-Scratch are not backed up in any way, shape, or form by Research Technologies or any other entity in UITS.
For information about the appropriate use of Slate-Scratch, including details on the data purge policy, see Slate-Scratch high performance file system: Terms of service.
Before storing data on any of Indiana University's research computing or storage systems, make sure you understand the information in Types of sensitive institutional data appropriate for UITS Research Technologies services.
Make sure you do not include sensitive institutional data as part of a file's filename or pathname.
Work with files on Slate-Scratch
Check usage
To check your inode usage (the number of items stored in your Slate-Scratch space), enter (replace username
with your IU username):
lfs quota -h -u username /N/scratch
To check the size of a file stored on Slate-Scratch, enter (replace username
with your IU username and filename
with the name of your file):
ls -sh /N/scratch/username/filename
List files
You should use ls -l
only when
necessary, or only on directories containing small amounts of data. Running ls -l
in directory on Slate-Scratch to list its contents and associated metadata (for example, ownership, permissions, and file size information for each file) can cause performance issues for you and other users, particularly if the directory contains a large amount of data.
Due to its parallel architecture, Lustre performs file and metadata operations separately. When you run ls -l
in a Slate-Scratch directory, the system contacts Lustre's Metadata Server (MDS) to get your data's location, ownership, and permissions information. However, to retrieve file size information, the system must contact multiple Object Storage Servers (OSSs), which in turn must contact multiple Object Storage Targets (OSTs) that store the data objects that make up your files. When the load on one or more OSS nodes is high, your ls -l
command may hang; other users on the file system may experience latency issues, as well.
Furthermore, some IU systems have ls
(without any options) aliased to ls --color=tty
, which enables the use of colors to distinguish file types. With the alias, running ls
initiates a full lookup to determine the color associated with each file, which (as with ls -l
) requires communication with the OSSs and the OSTs. Without the alias, running ls
contacts the MDS only (it does not initiate a full lookup involving the OSSs and OSTs). To avoid potential performance issues, you can override the ls --color=tty
alias, preventing ls
from initiating a full lookup. To do so, add the following line to your shell profile:
unalias ls
Using ls
to list information about individual files creates a lot less overhead on the file system:
- To check for the existence of a file (for example,
my_file
), use:ls my_file
- To see all details for a specific file (for example,
my_file
), use:ls -l my_file
Sort files by age
To determine which files located in or below the present working directory are the oldest (and at risk of being purged), you can list them by age (oldest to newest) using the find
command; for example:
find . -type f -exec ls -1hltr '{}' +;
In the command above:
- The dot (
.
) directsfind
to search the present working directory and its subdirectories. - The
-type f
test limits thefind
search to regular files. - The
-exec ls -1hltr "{}" +;
action makesfind
run thels
command on its search results and treat any subsequent arguments as options to that command until it encounters the semicolon (;
) argument. - The
+
directive builds a file list from thefind
search results, appending each file name to the{}
string. - The
ls
command parses the file list and (given the options provided) displays the results one file per line (-1
), in long format (-l
), with human-readable file sizes (-h
), sorted by modification time (-t
), and listed in reverse order (-r
).
To perform the same operation on a directory that's not the present working directory, use the same command and options, but replace the dot (.
) with the full path to the directory in question; for example (replace username
with your IU username and some_other_dir
with the directory you want to sort):
find /N/scratch/username/some_other_dir -type f -exec ls -1hltr "{}" +;
Transfer files
The Slate-Scratch file system is a parallel high performance file systems. Files are not "transferred" to the file system; instead the Slate-Scratch file system is mounted on IU's research supercomputers, making it accessible from those resources as a directory path (for example, /N/scratch/username
). To read or write a file on the Slate-Scratch file system, use the same standard Linux commands used for reading and writing files stored in your IU research supercomputer home directory.
tar
or gzip
, to bundle them into a small number of large files. Failure to do so can negatively impact performance of the file system and strain its file-count (inode) capacity.
Work with PHI
This UITS system or service meets certain requirements established in the HIPAA Security Rule thereby enabling its use for work involving data that contain protected health information (PHI). However, using this system or service does not fulfill your legal responsibilities for protecting the privacy and security of data that contain PHI. You may use this system or service for work involving data that contain PHI only if you institute additional administrative, physical, and technical safeguards that complement those UITS already has in place.
If you have questions about securing HIPAA-regulated research data at IU, email securemyresearch@iu.edu. SecureMyResearch provides self-service resources and one-on-one consulting to help IU researchers, faculty, and staff meet cybersecurity and compliance requirements for processing, storing, and sharing regulated and unregulated research data; for more, see About SecureMyResearch. To learn more about properly ensuring the safe handling of PHI on UITS systems, see the UITS IT Training video Securing HIPAA Workflows on UITS Systems. To learn about division of responsibilities for securing PHI, see Shared responsibility model for securing PHI on UITS systems.
For more, see:
- Your legal responsibilities for protecting data containing protected health information (PHI) when using UITS Research Technologies systems and services
- About protected health information (PHI) data elements in the classifications of institutional data
- Secure research data containing HIPAA-regulated PHI on high performance file systems at IU
Get help
Slate-Scratch is managed by the UITS Research Technologies High Performance File Systems (HPFS) team. If you need help or have questions, contact HPFS using the Research Technologies contact form; from the "Choose an area to direct your question to" drop-down, select .
This is document bgtr in the Knowledge Base.
Last modified on 2024-01-25 09:50:47.