ARCHIVED: On XSEDE, what is the SDSC Data Oasis?

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

On this page:


Overview

Hosted by San Diego Supercomputer Center (SDSC), the Data Oasis is a high-performance digital storage service available to Extreme Science and Engineering Discovery Environment (XSEDE) researchers with allocations on Gordon (SDSC) and Comet (SDSC).

The Data Oasis system serves two separate Lustre-based parallel file systems. One provides 1.6 PB of scratch space shared by all users on Gordon and Comet users; the other provides 2 PB of persistent (non-purged) project storage space for researchers with Data Oasis Project Storage allocations.

The Data Oasis is not an archival system. Data in scratch or project directories are not replicated or backed up. Users should back up critical data and store them elsewhere, and remove unneeded files and directories.

With an overall capacity of 4 PB, throughput exceeding 100 GB/s, and low average latencies, the Data Oasis is well-suited to support research involving extremely large data sets. For more, see the SDSC Data Oasis User Guide in the XSEDE User Portal. If you have questions or need help, contact the XSEDE Help Desk.

Project storage

Data Oasis Project Storage is an allocatable resource. By default, projects are awarded 500 GB of storage capacity, which is shared among all users on the project.

To request an allocation on an XSEDE dedicated storage service, follow the process for submitting a Research allocation request through the XSEDE User Portal. On the Resource Request page, select the desired system and indicate the storage capacity (in gigabytes) your project requires. For more, see ARCHIVED: Apply for a new XSEDE allocation

Project storage is accessible from the login and compute nodes on Gordon and Comet via the following path (replace <allocation> with the account name found by running show_accounts; replace <username> with your XSEDE username):

  /oasis/projects/nsf/<allocation>/<username>

Project storage is also mounted on the SDSC data mover nodes at:

  • oasis-dm.sdsc.edu (for Gordon users)
  • comet-dm.sdsc.edu (for Comet users)

SDSC recommends using globus-url-copy or Globus Online for large-scale data transfers, and SCP for small file transfers directly to or from Gordon or Comet.

Project storage data are retained for three months beyond the end of your project, after which you must migrate your data to another resource.

If your project exhausts its 500 GB allotment, the principal investigator (PI) may request more space by emailing the XSEDE Help Desk. The request must include a justification (not exceeding 500 words) that explains:

  • How much extra storage space is needed
  • How the extra space will be used
  • How the extra space will augment other non-SDSC storage resources available to the project

Requests are forwarded to SDSC staff for review; decisions and replies are made within five business days.

Scratch directories

XSEDE researchers with allocations on Gordon and Comet automatically receive access to shared, Lustre-based scratch space on the Data Oasis.

The scratch file system is configured so that users must write two one of two subdirectories:

  • /oasis/scratch/$USER/$PBS_JOBID
  • This subdirectory is created at the start of each job and is intended for use with applications that need more shared scratch than the compute service can provide. You can access this scratch space only after your job starts, so your batch script must contain lines to copy executables and other input data from your home directory. Data in Lustre scratch are not purged immediately after your job completes, so you have time to copy data you want to retain back to your project directory, home directory, or another location.

  • /oasis/scratch/$USER/temp_project
  • This subdirectory is intended for medium-term storage of data that are not in use by running jobs. Data stored here may be purged (with at least five days' notice) if the file system approaches full capacity.

Both subdirectories are served by the same set of object storage servers, and user jobs can read or write to either location with the same performance. To avoid the overhead of unnecessary data movement:

  • Read directly from temp_project instead of copying to $PBS_JOBID.
  • Write files that should be retained after your job completes directly to temp_project.

This document was developed with support from National Science Foundation (NSF) grants 1053575 and 1548562. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

This is document baxs in the Knowledge Base.
Last modified on 2018-01-18 17:51:29.