Use HTAR with your SDA account

Important:
Files containing PHI must be encrypted when they are stored (i.e., at rest) and when they are transferred between networked systems (i.e., in transit). Do not use HSI, HTAR, or Samba to transfer data containing PHI unless those data are encrypted already; HSI, HTAR, and Samba do not encrypt data during transit. To ensure that files containing PHI remain encrypted during transit, use SFTP/SCP or the IU Globus Web App. To ensure that files containing PHI are encrypted when they are stored on the SDA, encrypt them before transferring them. For more, see Recommended tools for encrypting data containing HIPAA-regulated PHI.

On this page:


About HTAR

The HPSS TAR (HTAR) command-line utility lets you create and work with .tar archives in HPSS. With HTAR, you can aggregate files stored on your local filesystem into .tar archives that are written directly HPSS. HTAR writes new archives directly to HPSS without creating intermediate archives on your local system or using HSI (or some other HPSS data transfer tool) to place archives in HPSS.

As HTAR creates each archive, it automatically builds a corresponding external index (.idx) file and stores it in the same HPSS directory as the archive. HTAR also can build (or rebuild) an index file for an HPSS .tar archive that does not have one, either because the archive was created using some other utility, or because the index was accidentally deleted.

Additionally, you can use HTAR to extract the entire contents of an HPSS archive to your local filesystem, or retrieve only certain specified files and/or directories.

HTAR command syntax

The general syntax for HTAR commands is as follows:

htar [action_options] -f [archive_name] [control_options] [file_list]

At least one action option, plus the -f option for specifying the archive's filename, are always required. For [file_list], indicate which files should be archived, extracted, or processed; use a space-delimited list of files or directory names (wildcard characters are accepted). By default, HTAR copies files from your current local directory into an archive file it creates in your HPSS home directory. To target an alternate source or destination directory, specify the path relative to your local or SDA home directory.

For complete information about HTAR action and control options, see the HTAR User Guide and HTAR manual page.

HTAR at IU

At Indiana University, HTAR is available on the UITS research computing systems, allowing you to create and work with .tar archives on the Scholarly Data Archive (SDA). To use HTAR on any of IU's research computing systems, you first must add the HPSS package to your software environment. To do this, on the command line, enter:

module load hpss

To make permanent changes to your environment, edit your ~/.modules file. For more, see Use a .modules file in your home directory to save your user environment on an IU research supercomputer.

Once the hpss module is loaded, you can execute HTAR commands from the system's command line.

Alternatively, for use on your personal workstation, you may contact the UITS Research Storage team to request HTAR bundled with HSI. Bundles are available for Red Hat Enterprise Linux 5 and 6, Ubuntu Linux, macOS, and Windows (running Cygwin).

HTAR limitations

While the archive created by HTAR can be of unlimited size (within the SDA's capacity), be aware of the following limitations:

  • An individual file within the archive may not be larger than 68 GB.
  • The directory path of any file stored may not be longer than 154 characters long.
  • The file name itself may not be longer than 99 characters.
  • There is a maximum of 1 million files in a single HTAR archive.

For more, see HTAR Usage - Limitations.

Use HTAR to create an archive on the SDA

The following examples demonstrate how to use HTAR on IU research computing systems to create .tar archives on the SDA.

  • To copy all files in the current local directory into an archive (e.g., my_archive.tar) that's created in your SDA home directory, on the command line, enter:
    htar -c -f my_archive.tar *

    In this command, the -c action option opens a connection to the SDA and copies all files in your home directory (denoted by the * wildcard character) into an archive that's created in the current working directory. The -f option assigns the archive a name (my_archive.tar).

    Note:
    HTAR will overwrite a pre-existing archive of the same name without prompting you.
  • To copy every file stored in a local subdirectory (e.g., ~/local_dir) into an archive (e.g., my_archive.tar) that's created in a pre-existing SDA subdirectory (e.g., sda_archives), specify each path relative to the respective home directory on the HTAR command line; for example:
    htar -c -f sda_archives/my_archive.tar "local_dir"

    In this command, the -c action option opens a connection to HPSS and copies all files from the specified local directory (~/local_dir) into an archive that's created in the specified SDA subdirectory; the -f option specifies the path to the archive and its name.

    Note:

    If your HTAR command's file list includes the tilde (~) representing your local home directory (i.e., you enter "~/local_dir" instead of "local_dir"), each entry in the resulting archive's index file will be prepended with the absolute path from your local system's root directory.

    This becomes an issue whenever you extract the files, because HTAR creates a new set of nested subdirectories based on the absolute path prepended to each index entry, and then stores the extracted files in the bottom-level directory.

    For example, if user darvader archives files from the death_star subdirectory on Big Red II, but enters "~/death_star" as the local path in the HTAR command's file list, all index entries for the resulting archive will be prepended with N/u/darvader/BigRed2. Afterward, whenever HTAR extracts files from that archive, it will read the index entries, and consequently save files to ~/N/u/darvader/BigRed2/death_star on the local system instead of saving them to ~/death_star.

  • To create an HPSS archive (e.g., my_archive.tar) in an SDA directory that does not already exist, add the -P control option to automatically create any non-existing subdirectories included in the archive file's pathname:
    htar -c -f new_directory/new_subdirectory/my_archive.tar -P "local_dir"

Use HTAR to create an index for an HPSS archive

For each archive created in the above examples, HTAR simultaneously creates a corresponding index file (e.g., my_archive.tar.idx) and stores it in the same HPSS directory as the archive.

You can use HTAR to recreate an index that has been accidentally deleted, or to create an index for an existing .tar archive that was created with another application.

If the index file for an archive (e.g., archive_name.tar) is missing, you will see the following error when you try to list or extract the files it contains:

"No such file: archive_name.idx"

To (re)build an index file for an HPSS .tar archive (e.g., old_archive.tar) that's missing its index, on the command line of your local system, enter:

htar -Xf old_archive.tar

In this command, the -X action option opens a connection to HPSS, reads the old_archive.tar file indicated by the -f option, builds an index file for the archive (e.g., old_archive.tar.idx), and stores it in the same directory as the archive.

Use HTAR to extract files from an SDA archive

The following examples demonstrate how to use HTAR to extract files from an archive stored on your SDA account.

Note:
HTAR extracts files into the current working directory on your local host. To extract files into a new directory, create the new directory first, and then change (cd) into the new directory before running HTAR.
  • To extract all files from an archive (e.g, my_archive) stored in your SDA home directory, on your local system's command line, enter:
    htar -x -f my_archive.tar

    In this command, the -x action option opens a connection to HPSS and extracts the entire contents of the archive specified by the -f option (my_archive.tar).

  • To extract one or more specific files or directories from an archive without retrieving the entire archive, on your local system's command line, enter:
    htar -xvf test.tar file1 file4 file7

    In this command, the -x action option opens a connection to HPSS and, from the archive specified by the -f option (test.tar), extracts only the files listed (file1, file4, and file7).

    Note:

    Because HTAR leaves processing of wildcard characters to the shell, you cannot use * to select multiple filenames when retrieving files from an archive stored in HPSS. To display the names of the files in contained in an archive (e.g., archive_10.tar) stored in your HPSS home directory, on your local system's command line, enter:

    htar -tf archive_10.tar

    In this command, the -t action option lists the files contained in the archive specified by the -f option (archive_10.tar). Files are listed in the order in which they appear in the archive.

Alternative authentication methods

By default, HTAR will prompt for login information (known as the "combo" authentication method). You also can set the authentication method explicitly by defining the HPSS_AUTH_METHOD environment variable; for example:

  • In the csh or tcsh, enter:
    setenv HPSS_AUTH_METHOD combo
  • In the ksh or bash shell, enter:
    export HPSS_AUTH_METHOD=combo

Alternatively, if your binaries are built with the appropriate method, you can use the HPSS_AUTH_METHOD environment variable to enable authentication based on either existing Kerberos credentials (known as the "Kerberos" method) or Kerberos keytabs (known as the "keytab" method):

  • Kerberos: To define the HPSS_AUTH_METHOD environment variable to enable the "kerberos" authentication method:
    • In the csh or tcsh shell, enter:
      setenv HPSS_AUTH_METHOD kerberos
    • In the ksh or bash shell, enter:
      export HPSS_AUTH_METHOD=kerberos
  • Keytab: To use the "keytab" method, you also must define the HPSS_KEYTAB_PATH environment variable (using the path to your keytab file) and the HPSS_PRINCIPAL environment variable (using the appropriate login name). For example, to define the required environment variables to enable the "keytab" authentication method:
    • In the csh or tcsh shell, enter the following (replace username with the appropriate login name and path/to/my_keytab with the path to your keytab file):
      setenv HPSS_PRINCIPAL username
      setenv HPSS_AUTH_METHOD keytab
      setenv HPSS_KEYTAB_PATH /path/to/my_keytab
      
    • In the ksh or bash shell, enter the following (replace username with the appropriate login name and path/to/my_keytab with the path to your keytab file):
      export HPSS_PRINCIPAL=username
      export HPSS_AUTH_METHOD=keytab
      export HPSS_KEYTAB_PATH=/path/to/my_keytab
      

For more about HSI/HTAR environment variables, see the HSI Environment Variables page in the Gleicher Enterprises HSI Reference Manual.

This is document awgg in the Knowledge Base.
Last modified on 2018-11-28 12:11:51.

Contact us

For help or to comment, email the UITS Support Center.