Optimize Your Code for the Intel XEON Phi

May 16, 2014 (Friday)
8:30 a.m. to 5 p.m. (CDT)
Texas Advanced Computing Center
J.J. Pickle Research Campus, ROC 1.900
10100 Burnet Rd.
Austin, TX 78758

This is an in person class.  There will be no webcast.

This class is intended for intermediate to advanced users of Stampede. Attendees are expected to be able to program using MPI and OpenMP.

The Innovative Technology component of the recently deployed XSEDE Stampede supercomputer at TACC provides access to 8 PetaFlops of computing power in the form of the new Intel Xeon Phi Coprocessor, also known as MIC. While the MIC is x86 based, hosts its own Linux OS, and is capable of running most user codes with little porting effort, the MIC architecture has significant features that are different from that of present x86 CPUs, and optimal performance requires an understanding of the possible execution models and basic details of the architecture. This workshop is designed to introduce Stampede users to the MIC architecture in a practical manner. Multiple lectures and hands-on exercises will be used to get the user acquainted with the MIC platform and explore the different execution modes as well as parallelization and optimization through example testing and reports.  Users are also welcome to bring their own codes to compile for MIC.

The workshop will be divided in four sections: Introduction to the MIC architecture; native execution and optimization; offload execution; and symmetric execution. In each section the users will spend half the time doing guided hands-on exercises.



Introduction (1.5 hours) 8:30-10:00

  • Xeon Phi Architecture
  • Programming models
    • Native Execution (MPI / Threads / MPI+Threads )
    • MPI on host and Phi
    • MPI on host, offload to Phi
      • Targeted
      • Automatic (MKL)
    • Offload to host from the Phi


  • Login and explore busybox

BREAK 10:00 -10:30


Native Execution (1.5 hours)

  • Native Execution
    • Why run native?
    • How to build a native application?
    • How to run a native application?
    • Best practices for running native
    • Optimization
      • Cache + ALU/SIMD details
      • Vectorization
      • Parallelization
      • Alignment
      • Compiler reports


  • Interactive exercise using compiler reports
  • Interactive exercise to show logical to physical proc mapping

LUNCH 12:00 - 1:00


Offload Execution (2 hours hours) 1:00 - 3:00

  • Offload to Phi
    • What is offloading?
    • Directives
    • Automatic offloading with MKL
    • Compiler assisted offloading
    • Offloading inside a parallel region


  • Interactive exercise with simple offload and data transfer

BREAK  3:00 - 3:30


Symmetric Execution (1.5 hours) 3:30 - 5:00

  • MPI execution
    • Symmetric execution
      • Workload distribution
      • ibrun.sym
      • Correct pinning of MPI tasks on host and coprocessor
      • Interactive exercise showing symmetric at work
    • MPI + offload
    • Pinning tasks to host and MIC


  • Exercise with symmetric execution and pinning


Registration Closed


Jason Allison
Advanced Scientific Computing
Senior Program Coordinator