At IU, how do I run TensorFlow?

TensorFlow is a flexible, distributable, portable, open source software library for creating and training machine learning models. TensorFlow computations are described by directed graphs composed of nodes, which represent individual mathematical operations, and edges, which represent the multidimensional data arrays (tensors) that flow between nodes. Graphs also can contain special edges that control dependencies between nodes. Computations expressed using TensorFlow's dataflow-based programming model can be executed on a wide variety of platforms, ranging from mobile devices to clusters with multiple CPUs and GPUs. TensorFlow was originally developed by researchers and engineers on the Google Brain team to support machine learning and deep neural networks research, but the system is flexible enough to be useful in other research domains. For more, see the TensorFlow website.

At Indiana University, TensorFlow is available for use on Big Red II. UITS recommends running TensorFlow on Big Red II's GPU-accelerated nodes. Although TensorFlow will function in CPU-only mode, GPU acceleration dramatically improves its performance.

On this page:


Options for setting up your TensorFlow environment

To set up a TensorFlow environment on Big Red II, you have the following options.

Loading the tensorflow module

The cleanest and most straightforward way to set up TensorFlow for use on Big Red II is to load the tensorflow module; on the command line, enter:

  module load tensorflow/1.1.0
Note:
  • The tensorflow module conflicts with Python and Anaconda, so if either of those modules is currently loaded, you must use the module unload command to remove it before loading the tensorflow module.
  • The tensorflow module uses its own Python build, which is different from the other Python builds available on Big Red II, and some packages you are accustomed to using may be missing. In such cases, you can install the missing packages to your home directory; for instructions, see How do I install Python packages on the research computing systems at IU?

For more about using Modules to configure your user environment, see On the research computing systems at IU, how do I use Modules to manage my software environment?

Using Anaconda to create a custom environment

Anaconda is a high-performance package manager and environment manager that lets you install your own Python packages, as well as your own Python versions, to create fully customized Python environments.

On Big Red II, you can use Anaconda to install your own version of TensorFlow along with all of its dependencies (including Python).

To load the default Anaconda module on Big Red II, on the command line, enter:

  module load anaconda2
Note:
The anaconda2 module conflicts with Python, so if a Python module is currently loaded, you must use the module unload command to remove it before loading the anaconda2 module.

The conda command is the primary tool for managing Anaconda environments and packages. To create a new environment (e.g., mytensorflow) and install TensorFlow with GPU support (along with its dependencies), on the command line, enter:

  conda create -y -n mytensorflow tensorflow-gpu=1.1.0

The -n argument is used to specify the name of the environment; the (optional) -y argument makes the command run without asking for confirmation. By default, the new environment will be created in the ~/.conda_envs directory.

The conda create command creates the new environment, downloads the specified package from the Anaconda package repository, and installs its contents. You should see something similar to the following as the command executes:

  bkyloren@login2:~> conda create -y -n mytensorflow tensorflow-gpu=1.1.0
  Fetching package metadata .........
  Solving package specifications: .
  
  Package plan for installation in environment
  /N/u/bkyloren/BigRed2/.conda_envs/mytensorflow:

  The following NEW packages will be INSTALLED:
  
      cudatoolkit:    7.5-2
      cudnn:          5.1-0
      funcsigs:       1.0.2-py27_0
      libprotobuf:    3.2.0-0
      mkl:            2017.0.3-0
      mock:           2.0.0-py27_0
      numpy:          1.12.1-py27_0
      openssl:        1.0.2l-0
      pbr:            1.10.0-py27_0
      pip:            9.0.1-py27_1
      protobuf:       3.2.0-py27_0
      python:         2.7.13-0
      readline:       6.2-2
      setuptools:     27.2.0-py27_0
      six:            1.10.0-py27_0
      sqlite:         3.13.0-0
      tensorflow-gpu: 1.1.0-np112py27_0
      tk:             8.5.18-0
      werkzeug:       0.12.2-py27_0
      wheel:          0.29.0-py27_0
      zlib:           1.2.8-3
  
  cudatoolkit-7. 100% |################################| Time: 0:00:03  59.97 MB/s
  cudnn-5.1-0.ta 100% |################################| Time: 0:00:01  44.64 MB/s
  mkl-2017.0.3-0 100% |################################| Time: 0:00:03  44.48 MB/s
  openssl-1.0.2l 100% |################################| Time: 0:00:00  31.17 MB/s
  readline-6.2-2 100% |################################| Time: 0:00:00  14.45 MB/s
  sqlite-3.13.0- 100% |################################| Time: 0:00:00  36.93 MB/s
  tk-8.5.18-0.ta 100% |################################| Time: 0:00:00  16.75 MB/s
  zlib-1.2.8-3.t 100% |################################| Time: 0:00:00   8.98 MB/s
  libprotobuf-3. 100% |################################| Time: 0:00:00  41.53 MB/s
  python-2.7.13- 100% |################################| Time: 0:00:00  41.92 MB/s
  funcsigs-1.0.2 100% |################################| Time: 0:00:00 651.96 kB/s
  numpy-1.12.1-p 100% |################################| Time: 0:00:00  38.31 MB/s
  setuptools-27. 100% |################################| Time: 0:00:00   7.35 MB/s
  six-1.10.0-py2 100% |################################| Time: 0:00:00 416.31 kB/s
  werkzeug-0.12. 100% |################################| Time: 0:00:00  19.54 MB/s
  wheel-0.29.0-p 100% |################################| Time: 0:00:00   8.25 MB/s
  protobuf-3.2.0 100% |################################| Time: 0:00:00  23.16 MB/s
  pbr-1.10.0-py2 100% |################################| Time: 0:00:00  10.64 MB/s
  mock-2.0.0-py2 100% |################################| Time: 0:00:00   4.43 MB/s
  tensorflow-gpu 100% |################################| Time: 0:00:01  44.30 MB/s

Your terminal may remain idle for 10 to 15 minutes while the necessary packages are installed. When the installation is complete, you will be returned to the Big Red II command prompt.

To activate your environment (e.g., mytensorflow), on the command line, enter:

  source activate mytensorflow

The environment name will be prepended to the system command prompt; for example:

  (mytensorflow) bkyloren@login2:~>

To deactivate your environment, on the command line, enter:

  source deactivate

For more about Anaconda and conda, see the Anaconda Distribution and Conda Documentation pages on the Anaconda website.

Using a Singularity container

A Singularity container is an encapsulation of an application and its dependencies (i.e., the libraries, packages, and data files it needs for execution), saved as a single image file; for more, see How do I use Singularity on IU's research computing systems?

A sample Singularity container image for running TensorFlow with GPU support in a virtualized CentOS environment is available on Big Red II at:

  /N/soft/cle5/singularity/images/tensorflow-centos7.img
  • To run the container as an interactive job:
    1. Request an interactive job in Big Red II's debug_gpu queue; on the command line, enter:
        qsub -I -l walltime=00:30:00 -l nodes=1:ppn=16 -l gres=ccm -q debug_gpu
      
    2. When your job starts, load the ccm module; on the aprun command line, enter:
        module load ccm
      
    3. Log into the compute node; on the aprun command line, enter:
        ccmlogin
      
    4. When placed on the compute node (e.g., nid00170), load the craype-accel-nvidia35 and singularity modules; on the compute node command line, enter:
        module load craype-accel-nvidia35 singularity
      
    5. Change to your Data Capacitor II scratch directory; on the compute node command line, enter (replace <username> with your IU username):
        cd /N/dc2/scratch/<username>
      
    6. Spawn a shell in the tensorflow-centos7.img container; on the compute node command line, enter:
        singularity shell /N/soft/cle5/singularity/images/tensorflow-centos7.img
      
  • To run the container as a batch job:
    1. Create a working directory in your Data Capacitor II scratch space (e.g., /N/dc2/scratch/<username>/singularity_work; replace <username> with your IU username).
    2. Prepare a TORQUE job script (e.g., my_job_script.pbs) similar to the following example:
        #!/bin/bash
        # file to submit non interactive jobs to bigred2
      
        #PBS -l nodes=1:ppn=16
        #PBS -l gres=ccm
        #PBS -q debug_gpu
        #PBS -l walltime=00:30:00
        
        module load ccm
        module load singularity
        cd /N/dc2/scratch/<username>/singularity_work
        ccmrun singularity exec /N/soft/cle5/singularity/images/tensorflow-centos7.img python my_tensorflow.py
      

      The execution line in the example script above invokes ccmrun to launch Singularity, which launches the tensorflor-centos.img container and runs a Python script (my_tensorflow.py) in the container.

    3. To submit the job, on the command line, enter:
        qsub my_job_script.pbs
      

Running TensorFlow

Regardless of the method you use, once you have your TensorFlow environment set up properly, you can access TensorFlow from the Python interpreter.

The following example demonstrates how to build a one-node "Hello, Tensorflow!" computational graph, launch it in a session, and evaluate the tensor object. Start on the Python primary prompt:

  1. Import Tensorflow:
      >>>import tensorflow as tf
    

    This gives Python access to all TensorFlow classes, methods, and symbols.

    To verify which version of TensorFlow you are running, enter:

      >>>tf.__version__
    
  2. Create the constant tensor hello by using the tf.constant operation to store the value Hello, TensorFlow!:
      >>>hello = tf.constant('Hello, TensorFlow!')
    
  3. Create a Session object using the tf.Session class:
      >>>sess = tf.Session()
    
    Note:

    When you create your Session object, you may see one or more error messages stating the following:

    The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.

    You may safely ignore these messages.

  4. Invoke the run method to evaluate the tensor hello, and print the result:
      >>>print(sess.run(hello))
      Hello, TensorFlow!
    

Getting help

If you need help, or have questions about using TensorFlow, Anaconda, or Singularity on Big Red II, contact the UITS Scientific Applications and Performance Tuning (SciAPT) team.

Support for IU research computing systems, software, and services is provided by various UITS Research Technologies units. For help, see Research computing support at IU.

This is document aoqh in the Knowledge Base.
Last modified on 2018-02-13 12:06:38.

  • Fill out this form to submit your issue to the UITS Support Center.
  • Please note that you must be affiliated with Indiana University to receive support.
  • All fields are required.

Please provide your IU email address. If you currently have a problem receiving email at your IU account, enter an alternate email address.

  • Fill out this form to submit your comment to the IU Knowledge Base.
  • If you are affiliated with Indiana University and need help with a computing problem, please use the I need help with a computing problem section above, or contact your campus Support Center.

Please provide your IU email address. If you currently have a problem receiving email at your IU account, enter an alternate email address.