ARCHIVED: Run TensorFlow on Big Red II at IU
On this page:
Big Red II was retired from service on December 15, 2019; for more, see ARCHIVED: About Big Red II at Indiana University (Retired).
About TensorFlow
TensorFlow is a flexible, distributable, portable, open source software library originally developed by researchers and engineers on the Google Brain team to support machine learning and deep neural networks research. Although TensorFlow was developed to create and train machine learning models, its dataflow-based programming model is flexible enough to be useful in other research domains.
TensorFlow computations are described by directed graphs composed of nodes and edges. Nodes represent individual mathematical operations, and edges represent the multidimensional data arrays (tensors) that flow between nodes. Graphs also may contain special edges that control dependencies between nodes. TensorFlow computations can be executed on a wide variety of platforms, ranging from mobile devices to clusters with multiple CPUs and GPUs. For more, see the TensorFlow website.
At Indiana University, TensorFlow is installed on Big Red II. For the best performance, UITS recommends running TensorFlow computations on Big Red II's hybrid CPU/GPU nodes. Although you can run TensorFlow on CPU-only nodes, GPU acceleration dramatically improves its performance.
Options for setting up your TensorFlow environment
To run TensorFlow computations on Big Red II, you first must set up your user environment. The cleanest and most straightforward way to set up a TensorFlow environment on Big Red II is to load the tensorflow
module. Alternatively, you can use Anaconda or a Singularity container to import a custom TensorFlow environment.
Load the tensorflow
module
To load the tensorflow
module, on the command line, enter:
module load tensorflow
The tensorflow
module uses its own Python build. When loaded, the tensorflow
module sets the PYTHONPATH
environment variable to /N/soft/cle5/tensorflow/1.1.0/lib/python2.7/site-packages
. Consequently:
- The
tensorflow
module will not load if theanaconda2
module or apython
module is already loaded. To check your user environment for a conflicting module, on the command line, entermodule list
. To remove a conflicting module, use themodule unload
command (e.g.,module unload anaconda2
). - Some Python packages you use regularly may be missing from the build TensorFlow uses. In such cases, you can install missing Python packages to your home directory. For instructions, see Install Python packages on the research supercomputers at IU.
Use Anaconda to create a custom environment
Anaconda is a high-performance package manager and environment manager that you can use to create a fully customized TensorFlow environment in your Big Red II home directory.
To use Anaconda on Big Red II, first load the anaconda2
module; on the command line, enter:
module load anaconda2
anaconda2
module will not load if a python
module is already loaded. To check your user environment for a conflicting Python module, on the command line, enter module list
. To remove a conflicting module, use the module unload
command (e.g., module unload python
).
Once the anaconda2
module is loaded, you can use the following conda create
command to create an environment (e.g., named mytensorflow
) that has TensorFlow with GPU support, along with its dependencies (including Python) installed:
conda create -y -n mytensorflow tensorflow-gpu=1.1.0
In the above command, the -n
argument is used to specify the name of the environment, and the (optional) -y
argument makes the command run without asking for confirmation.
As it executes, the conda create
command creates the new environment, downloads a specific version of the tensorflow-gpu
package from the Anaconda package repository, and installs its contents. By default, the new environment is created in the ~/.conda_envs
directory. You should see something similar to the following as the command executes:
bkyloren@login2:~> conda create -y -n mytensorflow tensorflow-gpu=1.1.0 Fetching package metadata ......... Solving package specifications: . Package plan for installation in environment /N/u/bkyloren/BigRed2/.conda_envs/mytensorflow: The following NEW packages will be INSTALLED: blas: 1.0-mkl cudatoolkit: 7.5-2 cudnn: 5.1-0 funcsigs: 1.0.2-py27_0 libprotobuf: 3.4.0-0 mkl: 2017.0.3-0 mock: 2.0.0-py27_0 numpy: 1.12.1-py27_0 openssl: 1.0.2l-0 pbr: 1.10.0-py27_0 pip: 9.0.1-py27_1 protobuf: 3.4.0-py27_0 python: 2.7.13-0 readline: 6.2-2 setuptools: 36.4.0-py27_0 six: 1.10.0-py27_0 sqlite: 3.13.0-0 tensorflow-gpu: 1.1.0-np112py27_0 tk: 8.5.18-0 werkzeug: 0.12.2-py27_0 wheel: 0.29.0-py27_0 zlib: 1.2.11-0 blas-1.0-mkl.t 100% |################################| Time: 0:00:00 72.97 kB/s cudatoolkit-7. 100% |################################| Time: 0:00:03 59.97 MB/s cudnn-5.1-0.ta 100% |################################| Time: 0:00:01 44.64 MB/s mkl-2017.0.3-0 100% |################################| Time: 0:00:03 44.48 MB/s openssl-1.0.2l 100% |################################| Time: 0:00:00 31.17 MB/s readline-6.2-2 100% |################################| Time: 0:00:00 14.45 MB/s sqlite-3.13.0- 100% |################################| Time: 0:00:00 36.93 MB/s tk-8.5.18-0.ta 100% |################################| Time: 0:00:00 16.75 MB/s zlib-1.2.11-0.t 100% |################################| Time: 0:00:00 8.98 MB/s libprotobuf-3. 100% |################################| Time: 0:00:00 41.53 MB/s python-2.7.13- 100% |################################| Time: 0:00:00 41.92 MB/s funcsigs-1.0.2 100% |################################| Time: 0:00:00 651.96 kB/s numpy-1.12.1-p 100% |################################| Time: 0:00:00 38.31 MB/s setuptools-36. 100% |################################| Time: 0:00:00 7.35 MB/s six-1.10.0-py2 100% |################################| Time: 0:00:00 416.31 kB/s werkzeug-0.12. 100% |################################| Time: 0:00:00 19.54 MB/s wheel-0.29.0-p 100% |################################| Time: 0:00:00 8.25 MB/s protobuf-3.4.0 100% |################################| Time: 0:00:00 23.16 MB/s pbr-1.10.0-py2 100% |################################| Time: 0:00:00 10.64 MB/s mock-2.0.0-py2 100% |################################| Time: 0:00:00 4.43 MB/s tensorflow-gpu 100% |################################| Time: 0:00:01 44.30 MB/s
Your terminal may remain idle for 10 to 15 minutes while the necessary packages are installed. When the installation is complete, you will be returned to the Big Red II command prompt.
To use your environment to run a TensorFlow script on Big Red II's hybrid CPU/GPU nodes:
- Use
qsub
to submit an interactive job to Big Red II's gpu queue. For example, the following command submits a request for an interactive job that runs for one hour on all 16 cores of one CPU/GPU node:qsub -I -l nodes=1:ppn=16 -l walltime=01:00:00 -l gres=ccm -q gpu
When your job is ready, you will be placed on an
aprun
service node. - From the
aprun
node's command line, load theccm
module, and then use theccmlogin
command to get placed on a compute node:bkyloren@aprun1:~> module load ccm bkyloren@aprun1:~> ccmlogin bkyloren@nid00180:~>
- From the compute node's command line, load the
anaconda2
module, and then use thesource activate
command to activate your TensorFlow environment (e.g.,mytensorflow
); for example:bkyloren@nid00180:~> module load anaconda2 Please unload any existing python modules first. (module unload python) anaconda2 version 4.2.0 loaded. bkyloren@nid00180:~> source activate mytensorflow
- Upon activation, the environment name (e.g.,
mytensorflow
) will be prepended to the compute node's command prompt, and you can launch your TensorFlow script (e.g.,tensorflow-app.py
) from there; for example:(mytensorflow) bkyloren@nid00180:~> tensorflow-app.py
To deactivate your environment, on the command line, enter:
source deactivate
For more about Anaconda and conda
commands, see the Anaconda Distribution and Conda Documentation pages on the Anaconda website.
Use a Singularity container
A Singularity container is an encapsulation of an application and its dependencies (i.e., the libraries, packages, and data files it needs for execution), saved as a single image file; for more, see Use Apptainer on Quartz or Big Red 200 at IU.
A sample Singularity container image for running TensorFlow with GPU support in a virtualized CentOS environment is available on Big Red II at:
/N/soft/cle5/singularity/images/tensorflow-centos7.img
- To run the container as an interactive job:
- Request an interactive job in Big Red II's debug_gpu queue; on the command line, enter:
qsub -I -l walltime=00:30:00 -l nodes=1:ppn=16 -l gres=ccm -q debug_gpu
- When your job starts, load the
ccm
module; on theaprun
command line, enter:module load ccm
- Log into the compute node; on the
aprun
command line, enter:ccmlogin
- When placed on the compute node (e.g.,
nid00170
), load thecraype-accel-nvidia35
andsingularity
modules; on the compute node command line, enter:module load craype-accel-nvidia35 singularity
- Change to your Data Capacitor II scratch directory; on the compute node command line, enter (replace
<username>
with your IU username):cd /N/dc2/scratch/<username>
- Spawn a shell in the
tensorflow-centos7.img
container; on the compute node command line, enter:singularity shell /N/soft/cle5/singularity/images/tensorflow-centos7.img
- Request an interactive job in Big Red II's debug_gpu queue; on the command line, enter:
- To run the container as a batch job:
- Create a working directory in your Data Capacitor II scratch space (e.g.,
/N/dc2/scratch/<username>/singularity_work
; replace<username>
with your IU username). - Prepare a TORQUE job script (e.g.,
my_job_script.pbs
) similar to the following example:#!/bin/bash # file to submit non interactive jobs to bigred2 #PBS -l nodes=1:ppn=16 #PBS -l gres=ccm #PBS -q debug_gpu #PBS -l walltime=00:30:00 module load ccm module load singularity cd /N/dc2/scratch/<username>/singularity_work ccmrun singularity exec /N/soft/cle5/singularity/images/tensorflow-centos7.img python my_tensorflow.py
The execution line in the example script above invokes
ccmrun
to launch Singularity, which launches thetensorflor-centos.img
container and runs a Python script (my_tensorflow.py
) in the container. - To submit the job, on the command line, enter:
qsub my_job_script.pbs
- Create a working directory in your Data Capacitor II scratch space (e.g.,
Run TensorFlow
Regardless of the method you use, once you have your TensorFlow environment set up properly, you can access TensorFlow from the Python interpreter.
The following example demonstrates how to build a one-node "Hello, Tensorflow!" computational graph, launch it in a session, and evaluate the tensor object. Start on the Python primary prompt:
- Import Tensorflow:
>>>import tensorflow as tf
This gives Python access to all TensorFlow classes, methods, and symbols.
To verify which version of TensorFlow you are running, enter:
>>>tf.__version__
- Create the constant tensor
hello
by using thetf.constant
operation to store the valueHello, TensorFlow!
:>>>hello = tf.constant('Hello, TensorFlow!')
- Create a Session object using the
tf.Session
class:>>>sess = tf.Session()
Note:When you create your Session object, you may see one or more error messages stating the following:
The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
You may safely ignore these messages.
- Invoke the
run
method to evaluate the tensorhello
, and print the result:>>>print(sess.run(hello)) Hello, TensorFlow!
Get help
If you need help, or have questions about using TensorFlow, Anaconda, or Singularity on Big Red II, contact the UITS Research Applications and Deep Learning team.
Research computing support at IU is provided by the Research Technologies division of UITS. To ask a question or get help regarding Research Technologies services, including IU's research supercomputers and research storage systems, and the scientific, statistical, and mathematical applications available on those systems, contact UITS Research Technologies. For service-specific support contact information, see Research computing support at IU.
This is document aoqh in the Knowledge Base.
Last modified on 2023-04-21 16:57:09.