Tensorflow at TACC
Last update: May 13, 2019

Scientists across domains are actively exploring and adopting deep learning as a cutting-edge methodology to make research breakthrough. At TACC, our mission is to enable discoveries that advance science and society through the application of advanced computing technologies. Thus, we are embracing this new type of application on our high end computing platforms.

TACC supports the Keras+TensorFlow+Horovod stack. This framework exposes high level interfaces for deep learning architecture specification, model training, tuning, and validation. Deep learning practitioners and domain scientists who are exploring the deep learning methodology should consider this framework for their research.

This document details how to install Tensorflow under both Python2 and Python3, download and run benchmarks in both single- and multi-node modes. Due to TensorFlow 1.13 updates with binding to CUDA/10.0 and TACC system-wide update to Intel/18, please pay close attention to the instructions when installing TensorFlow, Keras, and Horovod.

Installations at TACC

Tensorflow is installed on TACC's Stampede2 and Maverick2 resources.

  • Parallel Training with Keras, TensorFlow, and Horovod is available on both Stampede2 and Maverick2.
  • TensorFlow v1.6.0 to v1.13.1 are available on Stampede2.

Before you begin, note that all of the following examples are run on compute, not login, nodes. Running programs or doing computations on the login nodes may result in account suspension.
Use TACC's idev utility to grab a single compute node when conducting any tensorflow activities.

Tensorflow on Maverick2

Due to variations in Tensorflow and Python versions, and their compatabilities with the Intel compilers and CUDA libraries, the installation instructions are quite specific. Pay careful attention. to the installation instructions.

The following table lists available TensorFlow versions and their dependencies on Python version and Intel module version.

Tensorflow Version Python Version Environment Setting
TensorFlow 1.6-1.11 Python2 $ module load intel/17.0.4
$ module load python/2.7.13
$ module load cuda/9.0
$ module load cudnn/7.0
TensorFlow 1.6-1.11 Python3 $ module load intel/17.0.4
$ module load python3/3.6.3
$ module load cuda/9.0
$ module load cudnn/7.0
TensorFlow 1.12 Python2 $ module load intel/17.0.4
$ module load python/2.7.13
$ module load cuda/9.0
$ module load cudnn/7.4.2
$ export HDF5_USE_FILE_LOCKING=FALSE
TensorFlow 1.12 Python3 $ module load intel/17.0.4
$ module load python3/3.6.3
$ module load cuda/9.0
$ module load cudnn/7.4.2
$ export HDF5_USE_FILE_LOCKING=FALSE

Maverick 2 Installation

Maverick2 does not support CUDA/10.0, which is required for TensorFlow 1.13. So the latest verion of TensorFlow is 1.12.2.

The other note is that, with system-wide update to Intel/18, the default Python3 version is 3.7 now, and the only TensorFlow version that is compatible with Python3.7 is 1.13. Thus, on Maverick2, we can only support up to TensorFlow 1.12.2 with Intel/17.0.4.

You can install Tensorflow with different Python versions. Select one: 2.7 or 3.6.

  • install TensorFlow with Python 2.7

      c123-456$ module load intel/17.0.4
      c123-456$ module load python/2.7.13
      c123-456$ module load cuda/9.0
      c123-456$ module load cudnn/7.4.2
      c123-456$ export HDF5_USE_FILE_LOCKING=FALSE
      c123-456$ pip install --user tensorflow-gpu==1.12.2
      c123-456$ pip install --user keras h5py==2.8.0
  • or with Python 3.6

      c123-456$ module load intel/17.0.4
      c123-456$ module load python3/3.6.3
      c123-456$ module load cuda/9.0
      c123-456$ module load cudnn/7.4.2
      c123-456$ export HDF5_USE_FILE_LOCKING=FALSE
      c123-456$ pip3 install --user tensorflow-gpu==1.12.2
      c123-456$ pip3 install --user keras h5py==2.8.0

Installing Horovod

We suggest installing Horovod version 0.15.2. If you wish to install Horovod 0.16.1, please submit a support ticket with the subject "Request for Horovod 0.16" and TACC staff will provide special instructions.

The Nvidia Collective Communications Library (NCCL) provides the communication layer for distributed deep learning. Download this library into your $WORK directory. In the example below we will use /path/to/nccl_2.2.13-1+cuda9.0 for the path to this library.

  • Install Horovod 0.15.2 with Python2

      c123-456$ HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_NCCL_HOME=/path/to/nccl_2.2.13-1+cuda9.0 \
          HOROVOD_CUDA_HOME=/opt/apps/cuda/9.0 pip install --user horovod==0.15.2
  • Install Horovod 0.15.2 with Python3

      c123-456$ module load python3/3.6.3
      c123-456$ HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_NCCL_HOME=/path/to/nccl_2.2.13-1+cuda9.0 \
           HOROVOD_CUDA_HOME=/opt/apps/cuda/9.0 pip3 install --user horovod==0.15.2

Single-Node Mode

  • Download the tensorflow benchmark to your $WORK directory, then check out the branch that matches your tensorflow version.

      c123-456$ cdw; git clone https://github.com/tensorflow/benchmarks.git
      c123-456$ cd benchmarks
      c123-456$ git checkout cnn_tf_v1.12_compatible
  • Benchmark the performance with synthetic dataset on 1 GPU

      c123-456$ cd scripts/tf_cnn_benchmarks
      c123-456$ module load cuda/9.0 cudnn/7.4.2
      c123-456$ python tf_cnn_benchmarks.py --num_gpus=1 \
          --model resnet50 --batch_size 32 --num_batches 200
  • Benchmark the performance with synthetic dataset on 4 GPUs

      c123-456$ cd scripts/tf_cnn_benchmarks
      c123-456$ module load cuda/9.0 cudnn/7.4.2
      c123-456$ ibrun -np 4 \
      python tf_cnn_benchmarks.py --variable_update=horovod --num_gpus=1 \
          --model resnet50 --batch_size 32 --num_batches 200 --allow_growth=True

Multi-Node Mode

  • Download the Tensorflow benchmark to your $WORK directory. Check out the branch that matches your tensorflow version.

      c123-456$ cdw; git clone https://github.com/tensorflow/benchmarks.git
      c123-456$ git checkout branch_name
  • Benchmark the performance with synthetic dataset on these two 2 nodes using 8 GPUs

      c123-456$ cd scripts/tf_cnn_benchmarks
      c123-456$ module load intel/17.0.4 python/2.7.13 cuda/9.0 cudnn/7.4.2
      c123-456$ ibrun -np 8 python tf_cnn_benchmarks.py --variable_update=horovod
          --num_gpus=1 --model resnet50 --batch_size 32 --num_batches 200
  • Or you can run a batch job using the supplied script:

      $ cd $WORK/benchmarks/scripts/tf_cnn_benchmarks
      $ cp /home1/apps/tensorflow/test/run-2nodes.slurm .
      $ sbatch run-2nodes.slurm

Tensorflow on Stampede2

05/10/19 Updated Tensorflow instructions for Stampede2 are Coming Soon.

FAQ

Q: I have missing Python packages when using Tensorflow. What shall I do? A: These deep learning frameworks usually depend on many other packages. e.g., the Caffe package dependency list. On TACC resources, you can install these packages in user space by running

login1$ pip install –user package-name

Q: How can I run my Keras+Horovod program in parallel? A: Start on one node, run:

ibrun -np 1 python app.script

Monitor GPU usage in another terminal with:

watch -n 5 nvidia-smi

Make sure the process only allocates one GPU. Then run multiple processes on one node:

ibrun -np 4 python app.script

If the program crashes, check the standard output file, it may be caused by all processes landing on the same GPU. If this is the case, create a run.sh with the following lines in it:

#!/bin/bash
export RANK=$(($PMI_RANK%4))
python app.py

Then run:

ibrun -np 16 python run.sh