Install Python packages on the research supercomputers at IU

On this page:


Overview

Python packages are collections of modules (reusable code) that extend and enhance the functionality of the core Python language. Python developers contribute to the official Python Package Index (PyPI) repository, making their packages available to the Python community under open source license terms. The Python Packaging Authority (PyPA) manages the repository, and maintains a standard set of tools for building, distributing, and installing Python packages.

On Indiana University's research supercomputers, many third-party packages already are installed to supplement commonly used Python builds. For the most commonly used packages that are supported at IU, see the Supported packages section; for a complete list, type pip freeze after loading the Python module to see what packages are installed. If you have a unique need for a third-party Python package that is not already installed, you can use pip or setup.py to install the package in your home directory or in your Slate storage space.

Note:

Space on Slate is available to all IU research supercomputer users. To create a Slate account, follow the instructions in Get additional IU computing accounts.

If you know several researchers are interested in using a Python package that is not already installed, you can request to have it installed as a system-wide site package.

Install Python packages for personal use

Set up your user environment

The IU research supercomputers use module-based environment management systems that provide a convenient method for dynamically customizing your software environment.

To install Python packages, you must have Python added to your user environment. To check which modules are currently loaded; on the command line, enter:

module list

If Python is not among the list of currently loaded modules, use the module load command to add it; for example:

  • To add the default version, on the command line, enter:
    module load python
  • To add a non-default version:
    1. Check which versions are available; on the command line, enter:
      module avail python
    2. Load the preferred version; on the command line, enter (replace version_number with the preferred version number):
      module load python/version_number

If Python is listed among the currently loaded modules, but you prefer or need to use another version, you must remove the currently loaded module before loading the other version. To do this with one command, use module switch; for example, on the command line, enter (replace current_version with the version number of the currently loaded python module and new_version with the preferred version number):

module switch python/current_version python/new_version

You can save your customized user environment so that it loads every time you start a new session; for instructions, see Use modules to manage your software environment on IU research supercomputers.

Install a package using pip

The pip package management tool, one of the standard tools maintained by the Python Package Authority (PyPA), is the recommended tool for installing packages from the Python Package Index (PyPI) repository.

To install a package from the PyPI repository (for example, foo), use the pip install command with the --user flag; for example:

To install: Use the command:
The latest version pip install foo --user
A particular version (for example, foo 1.0.3) pip install foo==1.0.3 --user
A minimum version (for example, foo 2.0) pip install 'foo>=2.0' --user

The --user option directs pip to download and unpack the source distribution for your package (for example, foo) in the user site-packages directory for the running Python (for example, ~/.local/lib/python3.6/site-packages/foo). Python automatically searches this directory for modules, so prepending this path to the PYTHONPATH environmental variable is not necessary. If you omit the --user option, pip will try to install your package in the global site-packages directory (where you do not have the necessary permissions); as a result, the installation will fail.

Alternatively, you can use the --prefix option to install your package in your Slate storage space; for example, to install your package (package_name) in an existing subdirectory (python-pkgs) in your Slate space, enter:

pip install --prefix=/N/slate/$USER/python-pkgs package_name

If you install your package to a location other than the user site-packages directory, you will need to prepend the path to that directory to your PYTHONPATH environment variable; for example:

export PYTHONPATH=$PYTHONPATH:/N/slate/$USER/python-pkgs/lib/python3.x/site-packages
Note:
  • In the above example, 3.x represents the first two elements of the Python version number you are using. For example, if your Python version is 3.10.5, replace 3.x with 3.10.
  • This line can be added to your .bashrc file to avoid having to add it every time you log in.

For more about using pip, see the pip install page in the pip User Guide.

Install a package using its setup.py script

To install a Python package from a source other than the PyPI repository, you can download and unpack the source distribution yourself, and then use its setup.py script to install the package in the user site-packages directory:

  1. Set up your user environment (as described in the previous section).
  2. Use the wget command to download the distribution archive (for example, foo-1.0.3.gz) from the source (for example, http://pythonfoo.org); for example:
    wget http://pythonfoo.org/foo-1.0.3.gz
  3. Use tar to unpack the archive (for example, foo-1.0.3.gz); for example:
    tar -xzf foo-1.0.3.gz

    The distribution should unpack into a similarly-named directory in your home directory (for example, ~/foo-1.0.3).

  4. Change (cd) to the new directory, and then, on the command line, enter:
    python setup.py install --user

The --user option directs setup.py to install the package (for example, foo) in the user site-packages directory for the running Python (for example, ~/.local/lib/pythonX.Y/site-packages/foo).

Python automatically searches this directory for modules, so prepending this path to the PYTHONPATH environmental variable is not necessary.

If you omit the --user option, setup.py will try to install the package in the global site-packages directory (where you do not have the necessary permissions); as a result, the installation will fail.

Alternatively, you can use the --home or --prefix option to install your package in a different location (where you have the necessary permissions); for example, to install your package in a subdirectory (for example, python-pkgs):

  • Within your home directory, enter:
    python setup.py install --home=~/python-pkgs
  • In your Slate storage space, enter:
    python setup.py install --prefix=/N/slate/$USER/python-pkgs
Note:
If you install your package to a location other than the user site-packages directory, you will need to prepend the path to that directory to your PYTHONPATH environment variable. For more about PYTHONPATH, see Understand the module search order below.

For more on using setup.py to install packages, see Installing Python Modules (Alternate Installation).

Understand the module search order

Knowing how the Python interpreter responds to import statements can help you determine why a particular module or package isn't loading, or why an unexpected version of a package is loading, even though the correct version is installed and the path to its location is listed in your PYTHONPATH environment variable.

When Python launches, it searches the paths found in sys.path, a list of directories that determines the interpreter's search path for modules. The sys.path variable is initialized from the following locations, in this order:

  1. The directory containing the script used to invoke the Python interpreter (if the interpreter is invoked interactively or the script is read from standard input, this first item, path[0], remains an empty string, which directs Python to search modules in the current working directory first)
  2. The directories listed in PYTHONPATH
  3. The version-specific site-packages directory for the running Python installation; for example <sys.prefix>/lib/pythonX.Y/site-packages, in which <sys.prefix> represents the path to the running Python installation and X.Y represents the version number of the running Python installation

By default, Python also imports the site.py module upon initialization, which adds site-specific paths to the module search path (sys.path), including the path to your user site-packages directory within in your home directory (for example, ~/.local/lib/pythonX.Y/site-packages).

As site.py adds paths to sys.path, it scans them for path configuration (.pth) files, which contain additional directories that are added to sys.path. If a directory contains multiple .pth files, site.py processes them in alphabetical order.

However, some .pth files contain embedded commands that insert directory entries at the beginning of the module search path (ahead of the standard library path). As a result, a module from one of the inserted directories will load instead of the module of the same name from the standard library directory. This can be undesired and confusing behavior unless such a replacement is intended.

Note:

If your import requests are consistently disrupted by site.py and .pth files, try invoking the Python interpreter with the -S (uppercase "S"):

python -S

This disables the automatic import of site.py and, as a result, prevents it from manipulating sys.path. However, it also prevents site.py from adding your user site-packages directory to sys.path. To import site.py without adding your user site-packages directory to sys.path, invoke Python with the -s (lowercase "s") option:

python -s

To see which directories Python scans when you issue import commands, on the command line, enter:

python -c "import sys; print ('\n'.join(sys.path))"

Alternatively, launch Python in interactive mode, and then invoke the same commands in this order (>>> is the Python primary prompt):

>>>import sys
>>>print ('\n'.join(sys.path))
Note:
The sys.path variable is only an editable list of strings that you can edit like any other Python list. Avoid editing the first item in the list (path[0]), because many packages assume it refers to the directory containing the script used to invoke the Python interpreter.

Supported packages

Supported packages in the Python modules

Following are the most commonly used packages available in the Python modules. To see a full list of available packages, after loading a Python module, type:

pip list

or

pip freeze
  • astropy
  • biopython
  • cryptography
  • cutadapt
  • cycler
  • Cython
  • dask
  • deeptools
  • h5py
  • idna
  • igraph
  • jupyterlab
  • kiwisolver
  • MACS2
  • matplotlib
  • mpi4py
  • nose
  • notebook
  • numba
  • numpy
  • pandas
  • Pillow
  • plotly
  • pycosat
  • pysam
  • PySocks
  • pytz
  • scikit-learn
  • scipy
  • seaborn
  • tensorflow
  • tornado

Supported packages in the python/gpu modules

Following are the most commonly used packages available in the Python modules. To see a full list of available packages, after loading a python/gpu module, type:

pip list

or

pip freeze
  • accelerate
  • bs4
  • causality
  • cupy-cuda117
  • dask
  • editdistance
  • GraphViz
  • guppy3
  • h5py
  • imbalanced_learn
  • imutils
  • jax
  • Jinja2
  • keras
  • lightgbm
  • Markov
  • matplotlib
  • mxnet-cu112*
  • nibabel
  • nltk
  • numba
  • numpy
  • pandas
  • Pillow
  • sasa
  • scikit-learn
  • sds
  • seaborn
  • tensorflow*
  • torch*
  • vaderSentiment
  • Werkzeug

* Specifically GPU-capable packages

This is document acey in the Knowledge Base.
Last modified on 2024-03-19 14:16:41.