Optimize Your Code for the Intel XEON Phi

May 16, 2014 (Friday)
8:30 a.m. to 5 p.m. (CDT)
Texas Advanced Computing Center
J.J. Pickle Research Campus, ROC 1.900
10100 Burnet Rd.
Austin, TX 78758

This is an in person class.  There will be no webcast.

This class is intended for intermediate to advanced users of Stampede. Attendees are expected to be able to program using MPI and OpenMP.

The Innovative Technology component of the recently deployed XSEDE Stampede supercomputer at TACC provides access to 8 PetaFlops of computing power in the form of the new Intel Xeon Phi Coprocessor, also known as MIC. While the MIC is x86 based, hosts its own Linux OS, and is capable of running most user codes with little porting effort, the MIC architecture has significant features that are different from that of present x86 CPUs, and optimal performance requires an understanding of the possible execution models and basic details of the architecture. This workshop is designed to introduce Stampede users to the MIC architecture in a practical manner. Multiple lectures and hands-on exercises will be used to get the user acquainted with the MIC platform and explore the different execution modes as well as parallelization and optimization through example testing and reports.  Users are also welcome to bring their own codes to compile for MIC.

The workshop will be divided in four sections: Introduction to the MIC architecture; native execution and optimization; offload execution; and symmetric execution. In each section the users will spend half the time doing guided hands-on exercises.

AGENDA

PART I

Introduction (1.5 hours) 8:30-10:00

  • Xeon Phi Architecture
  • Programming models
    • Native Execution (MPI / Threads / MPI+Threads )
    • MPI on host and Phi
    • MPI on host, offload to Phi
      • Targeted
      • Automatic (MKL)
    • Offload to host from the Phi

LAB

  • Login and explore busybox

BREAK 10:00 -10:30

PART II

Native Execution (1.5 hours)

  • Native Execution
    • Why run native?
    • How to build a native application?
    • How to run a native application?
    • Best practices for running native
      • KMP_AFFINITY
    • Optimization
      • Cache + ALU/SIMD details
      • Vectorization
      • Parallelization
      • Alignment
      • Compiler reports

LAB

  • Interactive exercise using compiler reports
  • Interactive exercise to show logical to physical proc mapping

LUNCH 12:00 - 1:00

PART III

Offload Execution (2 hours hours) 1:00 - 3:00

  • Offload to Phi
    • What is offloading?
    • Directives
    • Automatic offloading with MKL
    • Compiler assisted offloading
    • Offloading inside a parallel region

LAB

  • Interactive exercise with simple offload and data transfer

BREAK  3:00 - 3:30

PART IV

Symmetric Execution (1.5 hours) 3:30 - 5:00

  • MPI execution
    • Symmetric execution
      • Workload distribution
      • ibrun.sym
      • Correct pinning of MPI tasks on host and coprocessor
      • Interactive exercise showing symmetric at work
    • MPI + offload
    • Pinning tasks to host and MIC

LAB

  • Exercise with symmetric execution and pinning

 

Registration Closed