Lonestar5 is decommissioned as of July 8, 2021.

See the new Lonestar6 User Guide

Lonestar5 User Guide
Last update: July 08, 2021

Updates & Notices

  • All users: Lonestar5 is no longer in production and users may not run jobs. The Lonestar5 login nodes will remain available for access to the /home, /scratch, and /work file systems until Lonestar6 is in production. Once the Lonestar6 login nodes are available, the Lonestar5 /scratch file system will be temporarily remounted in a new location, /scratch_ls5. (07/08/21)
  • All users: begin migrating all your data from the Lonestar5 file systems. The "/work2" file system is available. Submit a support ticket for questions on alternate compute resources and continue monitoring this User News item. (05/17/21)
  • Lonestar5 is down due to a hardware issue. Our staff is working on the problem and will have the system back in production as soon as possible. Please monitor this User News item for system status updates. (05/03/21)
  • All users: read Managing I/O on TACC Resources. TACC Staff have put forth new file system and job submission guidelines. (01/09/20)
  • See the Visualization section for updated VisIt and Paraview instructions. (08/15/19)
  • Subscribe to Lonestar5 News for updates on the system.

Lonestar5 Architecture

The Lonestar5 (LS5) system is designed for academic researchers in Austin and across Texas. It will continue to serve as the primary high performance computing resource in the University of Texas Research Cyberinfrastructure (UTRC) initiative, sponsored by The University of Texas System, as well as partner institutions Texas Tech and Texas A&M. Lonestar5 is a Cray XC40 customized by the TACC staff to provide a unique software environment designed to meet the needs of our diverse user community.

Figure 1. Lonestar5 Cabinets. Lonestar5 provides 1252 twenty-four core general compute nodes (for a total of 30,000 processor cores), 16 GPU nodes, and 10 large memory nodes. The system is configured with over 80 TB of memory and 5PB of disk storage, and has a peak performance of 1.2PF.

System Configuration

All Lonestar5 nodes run SuSE 11 and are managed with batch services through native Slurm 15.08. Global storage areas are supported by an NFS file system ($HOME) and two Lustre parallel distributed file systems ($WORK and $SCRATCH). Inter-node communication is through an Aries network with Dragonfly topology. Also the TACC Ranch tape archival system is available from Lonestar5.

The 1252 compute nodes are housed in 7 water-cooled cabinets, with three chassis per cabinet. Each chassis contains 16 blades and each blade consists of 4 dual socket nodes and an Aries interconnect chip. Each node has two Intel E5-2690 v3 12-core (Haswell) processors and 64 GB of DDR4 memory. Twenty four of the compute nodes are reserved for development and are accessible interactively for up to two hours. The 16 GPU nodes are distributed across the cabinets. Each GPU node has a dual socket E5-2680 v2 (Ivy Bridge) 10-core processor and 64GB of DDR4 memory

The Aries network provides dynamic routing, enabling for optimal use of the overall system bandwidth under load. See Section 2.4 for additional information.

Login nodes

  • Dual Socket
  • Xeon CPU E5-2650 v3 (Haswell): 10 cores per socket (20 cores/node), 2.30GHz
  • 128 GB DDR4-2133
  • Hyperthreading Disabled

Compute Nodes

  • Dual Socket
  • Xeon E5-2690 v3 (Haswell) : 12 cores per socket (24 cores/node), 2.6 GHz
  • 64 GB DDR4-2133 (8 x 8GB dual rank x8 DIMMS)
  • No local disk
  • Hyperthreading Enabled - 48 threads (logical CPUs) per node

Compute nodes lack a local disk, but users have access to a 32 GB /tmp RAM disk to accelerate IO operations. Note that any space taken in /tmp will decrease the total amount of memory available on the node. If 8 GB of data are written to /tmp the maximum memory available for applications and OS on the node will then be 64 GB - 8 GB = 56GB.


Lonestar5 uses an Aries Dragonfly interconnect. This high performance network has three levels. Rank 1 level has all to all connectivity in the backplane of each 16-node chassis (64 nodes) and provides per packet adaptive routing. Rank 2 level is made of sets of 6 chassis backplanes connected by passive copper cables, forming two cabinet groups (384 nodes). Rank 3 level is made of seven cabinets connected in an all to all fashion with active optical links. Point to point network bandwidth is expected to achieve 70 Gbit/s.

Figure 2. Lonestar5 Network. Four nodes within a blade (dark green boxes) are connected to an Aires router (larger dark blue box). Nodes within a chassis (16-blades shown arranged in a row) are connected (rank-1 routes) within a chassis by a backplane (light green line). Each blade of a six chassis group, 3 chassis from 2 racks (indicated by perforated lines),are connected (rank-2 routes, blue lines) through 5 ports of the router. Intra group connections (rank-3 routes) are formed through fiber (orange lines) to each router. There are 7 racks of normal-queue computes nodes (ranks 3,4 and 7 not shown).

GPU Nodes

  • Single Socket
  • Xeon E5-2680 v2 (Ivy Bridge) : 10 cores, 2.8 GHz, 115W
  • 64 GB DDR3-1866 (4 x 16GB DIMMS)
  • Nvidia K40 GPU 12 GB GDDR5 (4.2 TF SP, 1.4TF DP)
  • Hyperthreading Enabled - 20 threads (logical CPUs) per node

Large Memory Nodes

Lonestar5 provides two types of large memory nodes:

  • Eight Haswell nodes available to all users through the largemem512GB queue:

    • 512 GB RAM
    • dual socket Xeon E5-2698 v3
    • 16 cores per socket (32 cores/node), 2.3 GHz
    • Hyperthreading Enabled - 64 threads (logical CPUs) per node
  • Two Ivy Bridge nodes available to approved users through the largemem1TB queue:

    • 1TB RAM
    • quad socket Xeon E7-4860 v2
    • 12 cores per socket (48 cores/node), 2.26 GHz
    • Hyperthreading Enabled - 96 threads (logical CPUs) per node

The large memory nodes are accessible from the LS5 login nodes and access the same three Lustre high-performance file systems (/home, /work, and /scratch) as the rest of LS5. In many ways, however, the large memory nodes form a separate, largely independent cluster. This large memory cluster has its own Infiniband connectivity, Slurm job scheduler, queues, and software stack. The "TACC-largemem" module controls access to the large memory nodes. See Using the Large Memory Nodes section for more information.

Accessing Lonestar5

The standard way to access Lonestar5 (ls5.tacc.utexas.edu) and other TACC resources from your local machine is to use an SSH (Secure Shell) client. Please visit the Wikipedia page for more SSH info. SSH clients must support the SSH-2 protocol. Mac users may use the built-in Terminal application. Windows users may choose from many clients available for download. TACC staff recommends either of the following two light-weight and free clients.

Users must connect to Lonestar5 using the Secure Shell "ssh" command to ensure a secure login session. Use the secure shell commands "scp" and "sftp", and/or standard "rsync" command to transfer files. Initiate an ssh connection to a Lonestar5 login node from your local system:

localhost$ ssh taccuserid@ls5.tacc.utexas.edu

Login passwords can be changed in the TACC User Portal (TUP). Select "Change Password" under the "HOME" tab after you login. If you've forgotten your password, go to the TUP home page and select the "? Forgot Password" button in the Sign In area. To report a problem please run the ssh command with the "-vvv" option and include the verbose information when submitting a help ticket.

Do not run the optional ssh-keygen command to set up Public-key authentication. This command sets up a passphrase that will interfere with the specially configured .ssh directory that makes it possible for you to execute jobs on the compute nodes. If you have already done this, remove the ".ssh" directory (and the files under it) from your home directory. Log out and log back in to regenerate the keys.


You share Lonestar5 with many, sometimes hundreds, of other users, and what you do on the system affects others. All users must follow a set of good practices which entail limiting activities that may impact the system for other users. Exercise good citizenship to ensure that your activity does not adversely impact the system and the research community with whom you share it.

TACC staff has developed the following guidelines to good citizenship on Lonestar5. Please familiarize yourself especially with the first two mandates. The next sections discuss best practices on limiting and minimizing I/O activity and file transfers. And finally, we provide job submission tips when constructing job scripts to help minimize wait times in the queues.

Do Not Run Jobs on the Login Nodes

Lonestar5's few login nodes are shared among all users. Dozens, (sometimes hundreds) of users may be logged on at one time accessing the file systems. Think of the login nodes as a prep area, where users may edit and manage files, compile code, perform file management, issue transfers, submit new and track existing batch jobs etc. The login nodes provide an interface to the "back-end" compute nodes.

The compute nodes are where actual computations occur and where research is done. Hundreds of jobs may be running on all compute nodes, with hundreds more queued up to run. All batch jobs and executables, as well as development and debugging sessions, must be run on the compute nodes. To access compute nodes on TACC resources, one must either submit a job to a batch queue or initiate an interactive session using the idev utility.

A single user running computationally expensive or disk intensive task/s will negatively impact performance for other users. Running jobs on the login nodes is one of the fastest routes to account suspension. Instead, run on the compute nodes via an interactive session (idev) or by submitting a batch job.

Do not run jobs or perform intensive computational activity on the login nodes or the shared file systems.
Your account may be suspended and you will lose access to the queues if your jobs are impacting other users.

Dos & Don'ts on the Login Nodes

  • Do not run research applications on the login nodes; this includes frameworks like MATLAB and R, as well as computationally or I/O intensive Python scripts. If you need interactive access, use the idev utility or Slurm's srun to schedule one or more compute nodes.

    DO THIS: Start an interactive session on a compute node and run Matlab.

      login1$ idev
      nid00181$ matlab

    DO NOT DO THIS: Run Matlab or other software packages on a login node

    login1$ matlab
  • Do not launch too many simultaneous processes; while it's fine to compile on a login node, a command like "make -j 16" (which compiles on 16 cores) may impact other users.

    DO THIS: build and submit a batch job. All batch jobs run on the compute nodes.

      login1$ make mytarget
      login1$ sbatch myjobscript

    DO NOT DO THIS: Invoke multiple build sessions.

    login1$ make -j 12

    DO NOT DO THIS: Run an executable on a login node.

      login1$ ./myprogram
  • That script you wrote to poll job status should probably do so once every few minutes rather than several times a second.

Do Not Stress the Shared File Systems

The TACC Global Shared File System, Stockyard, is mounted on most TACC HPC resources as the /work ($WORK) directory. This file system is accessible to all TACC users, and therefore experiences a lot of I/O activity (reading and writing to disk, opening and closing files) as users run their jobs, read and generate data including intermediate and checkpointing files. As TACC adds more users, the stress on the $WORK file system is increasing to the extent that TACC staff is now recommending new job submission guidelines in order to reduce stress and I/O on Stockyard.

TACC staff now recommends that you run your jobs out of the $SCRATCH file system instead of the global $WORK file system.

To run your jobs out $SCRATCH:

  • Copy or move all job input files to $SCRATCH
  • Make sure your job script directs all output to $SCRATCH
  • Once your job is finished, move your output files to $WORK to avoid any data purges.

Compute nodes should not reference $WORK unless it's to stage data in/out only before/after jobs.

Consider that $HOME and $WORK are for storage and keeping track of important items. Actual job activity, reading and writing to disk, should be offloaded to your resource's $SCRATCH file system (see Table. File System Usage Recommendations. You can start a job from anywhere but the actual work of the job should occur only on the $SCRATCH partition. You can save original items to $HOME or $WORK so that you can copy them over to $SCRATCH if you need to re-generate results.

More File System Tips

  • Don't run jobs in your $HOME directory. The $HOME file system is for routine file management, not parallel jobs.

  • Watch all your file system quotas. If you're near your quota in $WORK and your job is repeatedly trying (and failing) to write to $WORK, you will stress that file system. If you're near your quota in $HOME, jobs run on any file system may fail, because all jobs write some data to the hidden $HOME/.slurm directory.

  • Avoid storing many small files in a single directory, and avoid workflows that require many small files. A few hundred files in a single directory is probably fine; tens of thousands is almost certainly too many. If you must use many small files, group them in separate directories of manageable size.

  • TACC resources, with a few exceptions, mount three file systems: /home, /work and /scratch. Please follow each file system's recommended usage.

File System Best Storage Practices Best Activities
$HOME cron jobs
small scripts
environment settings
compiling, editing
$WORK store software installations
original datasets that can't be reproduced
job scripts and templates
staging datasets
$SCRATCH Temporary Storage
I/O files
job files
temporary datasets
all job I/O activity
see TACC's Scratch File System Purge Policy.

Limit Input/Output (I/O) Activity

In addition to the file system tips above, it's important that your jobs limit all I/O activity. This section focuses on ways to avoid causing problems on each resources' shared file systems.

  • Limit I/O intensive sessions (lots of reads and writes to disk, rapidly opening or closing many files)

  • Avoid opening and closing files repeatedly in tight loops. Every open/close operation on the file system requires interaction with the MetaData Service (MDS). The MDS acts as a gatekeeper for access to files on Lustre's parallel file system. Overloading the MDS will affect other users on the system. If possible, open files once at the beginning of your program/workflow, then close them at the end.

  • Don't get greedy. If you know or suspect your workflow is I/O intensive, don't submit a pile of simultaneous jobs. Writing restart/snapshot files can stress the file system; avoid doing so too frequently. Also, use the hdf5 or netcdf libraries to generate a single restart file in parallel, rather than generating files from each process separately.

If you know your jobs will require significant I/O, please submit a support ticket and an HPC consultant will work with you. See also Managing I/O on TACC Resources for additional information.

File Transfer Guidelines

In order to not stress both internal and external networks, be mindful of the following guidelines:

  • When creating or transferring large files to Stockyard ($WORK) or the $SCRATCH file systems, be sure to stripe the receiving directories appropriately. See Striping Large Files in the Stampede2 User Guide for more information.

  • Avoid too many simultaneous file transfers. You share the network bandwidth with other users; don't use more than your fair share. Two or three concurrent scp sessions is probably fine. Twenty is probably not.

  • Avoid recursive file transfers, especially those involving many small files. Create a tar archive before transfers. This is especially true when transferring files to or from Ranch.

Job Submission Tips

  • Request Only the Resources You Need Make sure your job scripts request only the resources that are needed for that job. Don't ask for more time or more nodes than you really need. The scheduler will have an easier time finding a slot for a job requesting 2 nodes for 2 hours, than for a job requesting 4 nodes for 24 hours. This means shorter queue waits times for you and everybody else.

  • Test your submission scripts. Start small: make sure everything works on 2 nodes before you try 20. Work out submission bugs and kinks with 5 minute jobs that won't wait long in the queue and involve short, simple substitutes for your real workload: simple test problems; hello world codes; one-liners like ibrun hostname; or an ldd on your executable.

  • Respect memory limits and other system constraints. If your application needs more memory than is available, your job will fail, and may leave nodes in unusable states. Use TACC's Remora tool to monitor your application's needs.

Computing Environment

Lonestar5's default login shell is Bash. The csh and zsh shells are also available. Submit a support ticket to change your default login shell; the chsh command is not supported.

Lonestar5 does not support ".profile_user",".cshrc_user", or".login_user" startup files. Put your aliases and other customizations directly in the standard startup files. See the default templates in your account for further instructions and examples. Unless you have specialized needs, it is generally best to leave the bash ".profile" file alone and place all customizations in the ".bashrc" file.


TACC continually updates application packages, compilers, communications libraries, tools, and math libraries. To facilitate this task and to provide a uniform mechanism for accessing different versions of software. TACC employs Lmod for environment management.

At login, modules commands set up a basic environment for the default compilers, tools, and libraries. For example: the $PATH, $MANPATH, $LIB_LIBRARY_PATH environment variables, directory locations (e.g.,$WORK,$HOME) and aliases (e.g., cdw, cdh). Therefore, there is no need for you to set them or update them when updates are made to system and application software.

Users that require third-party applications, special libraries, and tools for their projects can quickly tailor their environment with only the applications and tools they need. Using modules to define a specific application environment allows you to keep your environment free from the clutter of all the application environments you don't need.

The environment for executing each major TACC application can be set with a module command. The specifics are defined in a modulefile file, which sets, unsets, appends to, or prepends to environment variables (e.g.,$PATH, $LD_LIBRARY_PATH) for the specific application. Each modulefile also sets functions or aliases for use with the application. You only need to invoke a single command to configure the application/programming environment properly. The general format of this command is:

login1$ module load modulename

To look at the available modules, you can execute the following command:

login1$ module avail

Once you know the module that you want to load, you can simply use the load option. Or you can get more information about a specific software version (in this case Petsc) with the spider option:

login1$ module spider petsc/3.10

Use the spider option to find versions of a particular package. For example, to find all the hdf5 modules, type:

login1$ module spider hdf5

To look at a synopsis about using an application in the module's environment (in this case, fftw2), or to see a list of currently loaded modules, execute the following commands:

login1$ module help fftw2
login1$ module list

Managing your Files

Lonestar5 supports multiple file transfer programs such as scp, sftp, and rsync. During production, transfer speeds between Lonestar5 and other resources vary with I/O and network traffic.

File Systems & Quotas

Lonestar5 mounts the three file systems that are shared across all nodes: home, work, and scratch. The system also defines for you corresponding account-level environment variables $HOME, $SCRATCH, and $WORK. Consult Table 2 for quota and purge policies on these file systems.

Several aliases are provided for users to move easily between file systems:

  • Use the "cdh" or "cd" commands to change to $HOME
  • Use "cdw" to change to $WORK
  • Use the "cds" command to change to $SCRATCH

The $WORK file system mounted on Lonestar5 is the Global Shared File System hosted on the Stockyard system. It is the same file system that is available on Stampede2, Maverick2, Wrangler, and other TACC resources. The $STOCKYARD environment variable points to a directory on the file system that is associated with your account; this variable has the same definition on all TACC systems. The $WORK environment variable on Lonestar5 points to the lonestar subdirectory, a convenient location for activity on Lonestar5; the value of the $WORK environment variable will vary from system to system. Your quota and reported usage on this file system is the sum of all files stored on Stockyard regardless of their actual location on the work file system.

Stockyard Work file system

Figure 3. Stockyard File System. Account-level directories on the work file system (Global Shared File System hosted on Stockyard). Example for fictitious user bjones. All directories usable from all systems. Sub-directories (e.g. frontera, maverick2) exist only when you have allocations on the associated system.

Scratch storage is provided by DataDirect Networks, and has a raw unformatted capacity of over 5PB. Lonestar5's $SCRATCH file system is composed of:

  • 168 Object Storage Targets
  • 1 MetaData Target
  • 5.472 PB raw storage

TACC's Corral storage system is available as a mount point on Lonestar5's compute and login nodes.

File System Quota Key Features
$HOME 5GB Not intended for parallel or high-intensity file operations.
Backed up regularly.
Not purged.
$WORK 1TB, 3,000,000 files across all TACC systems,
regardless of where on the file system the files reside.
Not intended for high-intensity file operations or jobs involving very large files.
On the Global Shared File System that is mounted on most TACC systems.
See Stockyard system description for more information.
Defaults: 1 stripe, 1MB stripe size
Not backed up.
Not purged.
$SCRATCH no quota Not backed up.
Files are subject to purge if access time is more than 10 days old.

Scratch File System Purge Policy

The $SCRATCH file system, as its name indicates, is a temporary storage space. Files that have not been accessed* in ten days are subject to purge. Deliberately modifying file access time (using any method, tool, or program) for the purpose of circumventing purge policies is prohibited.

*The operating system updates a file's access time when that file is modified on a login or compute node or any time that file is read. Reading or executing a file/script will update the access time. Use the "ls -ul" command to view access times.

Sharing Files

Users often wish to collaborate with fellow project members by sharing files and data with each other. Project managers or delegates can create shared workspaces, areas that are private and accessible only to other project members, using UNIX group permissions and commands. Shared workspaces may be created as read-only or read-write, functioning as data repositories and providing a common work area to all project members. Please see Sharing Project Files on TACC Systems for step-by-step instructions.


Use the Secure Shell scp utility to transfer data from any Linux system to and from the login node. A file can be copied from your local system to the remote server by using the command:

localhost% scp filename \

Consult the man pages for more information on scp:

login1$ man scp


The rsync command is another way to keep your data up to date. In contrast to scp, rsync transfers only the actual changed parts of a file (instead of transferring an entire file). Hence, this selective method of data transfer can be much more efficient than scp. The following example demonstrates usage of the rsync command for transferring a file named "myfile.c" from the current location on Lonestar5 to Stampede's $WORK directory.

login1$ rsync myfile.c \  

An entire directory can be transferred from source to destination by using rsync as well. For directory transfers the options "-avtr" will transfer the files recursively ("-r" option) along with the modification times ("-t" option) and in the archive mode ("-a" option) to preserve symbolic links, devices, attributes, permissions, ownerships, etc. The "-v" option (verbose) increases the amount of information displayed during any transfer. The following example demonstrates the usage of the "-avtr" options for transferring a directory named "gauss" from a local machine to a directory named "data" in the $WORK file system on Lonestar5.

login1$ rsync -avtr ./gauss \  

When executing multiple instantiations of scp or rsync, please limit your transfers to no more than 2-3 processes at a time.

For more rsync options and command details, consult the man page or help options

login1$ man rsync
login1$ rsync -h

Running Jobs on Lonestar5

This section provides an overview of how compute jobs are charged to allocations and describes the Simple Linux Utility for Resource Management (Slurm) batch environment, Lonestar5 queue structure, lists basic Slurm job control and monitoring commands along with options.

Job Accounting

Like all TACC systems, Lonestar5's accounting system is based on node-hours: one unadjusted Service Unit (SU) represents a single compute node used for one hour (a node-hour). For any given job, the total cost in SUs is the use of one compute node for one hour of wall clock time plus any charges or discounts for the use of specialized queues, e.g. Frontera's flex queue, Stampede2's development queue, and Longhorn's v100 queue. The queue charge rates are determined by the supply and demand for that particular queue or type of node used and are subject to change.

Lonestar5 SUs billed = (# nodes) x (job duration in wall clock hours) x (charge rate per node-hour)

The Slurm scheduler tracks and charges for usage to a granularity of a few seconds of wall clock time. The system charges only for the resources you actually use, not those you request. If your job finishes early and exits properly, Slurm will release the nodes back into the pool of available nodes. Your job will only be charged for as long as you are using the nodes.

TACC does not implement node-sharing on any compute resource. Each Lonestar5 node can be assigned to only one user at a time; hence a complete node is dedicated to a user's job and accrues wall-clock time for all the node's cores whether or not all cores are used.

Tip: Your queue wait times will be less if you request only the time you need: the scheduler will have a much easier time finding a slot for the 2 hours you really need than say, for the 12 hours requested in your job script.

Principal Investigators can monitor allocation usage via the TACC User Portal under "Allocations->Projects and Allocations". Be aware that the figures shown on the portal may lag behind the most recent usage. Projects and allocation balances are also displayed upon command-line login.

To display a summary of your TACC project balances and disk quotas at any time, execute:

login1$ /etc/tacc/taccinfo        # Generally more current than balances displayed on the portals.

Interactive vs Batch Jobs

Once logged into Lonestar5 users are automatically placed on one of two "front-end" login nodes. To determine what type of node you're on, simply issue the "hostname" command. Lonestar5's login nodes will be labeled login[1-2].ls5.tacc.utexas.edu. The compute nodes will be labeled something like nid00181.

The two login nodes provide an interface to the "back-end" compute nodes. Think of the login nodes as a prep area, where users may edit, compile, perform file management, issue transfers, submit batch jobs etc.

The compute nodes are where actual computations occur and where research is done. All batch jobs and executables, as well as development and debugging sessions, are run on the compute nodes.

To run jobs and access compute nodes on TACC resources, one must either submit a job to a batch queue or initiate an interactive session using the idev utility. You can also access a compute node via ssh if you are already running a job on that node.

Slurm Scheduler

Schedulers such as LoadLeveler, SGE and Slurm differ in their user interface as well as the implementation of the batch environment. Common to all, however, is the availability of tools and commands to perform the most important operations in batch processing: job submission, job monitoring, and job control (cancel, resource request modification, etc.). The scheduler on Lonestar5 is Slurm. Lonestar4 used SGE, but if you've used any of the newer systems at TACC, you will be familiar with Slurm.

Batch jobs are programs scheduled for execution on the compute nodes, to be run without human interaction. A job script (also called "batch script") contains all the commands necessary to run the program: the path to the executable, program parameters, number of nodes and tasks needed, maximum execution time, and any environment variables needed. Batch jobs are submitted to a queue and then managed by a scheduler. The scheduler manages all pending jobs in the queue and allocates exclusive access to the compute nodes for a particular job. The scheduler also provides an interface allowing the user to submit, cancel, and modify jobs.

All users must wait their turn, not first come, first served. The Slurm scheduler will fit jobs in when the requested resources (nodes, runtime) become available. Do not request more resources than needed, else the job will wait longer in the queue.

Production Queues

The Lonestar5 production queues, Standard Memory and Large Memory, and their characteristics (wall-clock and processor limits; charge factor; and purpose) are listed in Table 3a and Table 3b below. Queues that don't appear in the table (such as systest) are non-production queues for system and HPC group testing and special support.

Table 3a. Lonestar5 Standard Memory Queues

Queue Max Runtime Max Nodes and
Associated Cores
per Job
Max Jobs in Queue Queue Multiplier Purpose
normal 48 hrs 171 nodes (4104 cores) 50 1 normal production
(by request*)
24 hrs 342 nodes (8208 cores) 1 1 large runs
development 2 hrs 11 nodes (264 cores) 1 1 development nodes
gpu 24 hrs 4 nodes (40 cores) 4 1 GPU nodes
vis 8 hrs 4 nodes (40 cores) 4 1 GPU nodes + VNC service

*For access to the large queue, please submit a ticket to the TACC User Portal. Include in your request reasonable evidence of your readiness to run at scale on Lonestar5. In most cases this should include strong or weak scaling results summarizing experiments you have run on Lonestar5 up to the limits of the normal queue.

An important note on scheduling: hyper-threading is currently enabled on Lonestar5. While there are 24 cores on each non-GPU standard memory node, the operating system and scheduler will report a total of 48 CPUs (hardware threads).

Table 3b. Lonestar5 Large Memory Queues

These queues are available through TACC-largemem module. See "Using the Large Memory Nodes" section below for more information.

Queue Max Runtime Max Nodes and
Associated Cores
per Job
Max Jobs in Queue Queue Multiplier Purpose
largemem512GB 48 hrs 2 nodes (64 cores) 3 3 large memory (512GB)
32 cores/node
(by request*)
48 hrs 1 node (48 cores) 2 5 large memory (1TB)
48 cores/node

*For access to the largemem1TB queue, please submit a ticket to the TACC User Portal that includes reasonable evidence of your need for this queue.

An important note on scheduling: hyper-threading is currently enabled on Lonestar5. While there are 32 cores on each 512GB Haswell node, the operating system and scheduler will report a total of 64 CPUs (hardware threads). Similarly, there are 48 cores on each 1TB node, but the operating system and scheduler will report a total of 96 CPUs.

Submit a batch job with sbatch

Use Slurm's sbatch command to submit a job. Specify the resources needed for your job (e.g., number of nodes/tasks needed, job run time) in a Slurm job script. See "/share/doc/slurm" for example Slurm job submission scripts.

login1$ sbatch myjobscript

where "myjobscript" is the name of a UNIX format text file containing job script commands. This file can contain both shell commands and special statements that include #SBATCH options and resource specifications; shell commands other than the initial parser line (e.g. #!/bin/bash) must follow all #SBATCH Slurm directives. Some of the most common options are described in Table 4 below and in the example job scripts. Details are available online in man pages (e.g., execute "man sbatch" on Lonestar5).

Options can be passed to sbatch on the command-line or specified in the job script file; we recommend, however, that you avoid using the "–export" flag because there are subtle ways in which it can interfere with the automatic propagation of your environment. As a general rule it is safer and easier to store commonly used #SBATCH directives in a submission script that will be reused several times rather than retyping the options at every batch request. In addition, it is easier to maintain a consistent batch environment across runs if the same options are stored in a reusable job script. All batch submissions MUST specify a time limit, number of nodes, and total tasks. Jobs that do not use the -t (time), -N (nodes) and -n (total tasks) options will be rejected.

Batch scripts contain two types of statements: scheduler directives and shell commands in that order. Scheduler directive lines begin with #SBATCH and are followed with sbatch options. Slurm stops interpreting #SBATCH directives after the first appearance of a shell command (blank lines and comment lines are okay). The UNIX shell commands are interpreted by the shell specified on the first line after the #! sentinel; otherwise the Bash shell (/bin/bash) is used. By default, a job begins execution in the directory of submission with the local (submission) environment.

If you don't want stderr and stdout directed to the same file, use both "-e" and "-o" options to designate separate output files. By default, stderr and stdout are sent to a file named "slurm-%j.out", where "%j" is replaced by the job ID; and with only an "-o" option, both stderr and stdout are directed to the same designated output file.

The job script below requests an MPI job with 48 cores spread over 2 nodes and 1.5 hours of run time in the development queue:

#SBATCH -J myMPI            # job name
#SBATCH -o myMPI.o%j        # output and error file name (%j expands to jobID)
#SBATCH -N 2                # number of nodes requested
#SBATCH -n 48               # total number of mpi tasks requested
#SBATCH -p development      # queue (partition) -- normal, development, etc.
#SBATCH -t 01:30:00         # run time (hh:mm:ss) - 1.5 hours

#SBATCH --mail-user=username@tacc.utexas.edu
#SBATCH --mail-type=begin   # email me when the job starts
#SBATCH --mail-type=end     # email me when the job finishes

# run the executable named a.out
ibrun ./a.out               

Sample Slurm Batch Scripts

Five sample batch scripts are below. Additional example Slurm scripts can be found in /share/docs/slurm.

Table 4. Common sbatch Options

Option Argument Function
-p queue_name Submits to queue (partition) designated by queue_name
-J job_name Job Name
-n total_tasks The job acquires enough nodes to execute total_tasks tasks (launching 48 tasks/node).
Always use the -N option with the -n option. This also allows for fewer than 48 tasks/node to be specified(e.g. for hybrid codes).
-N nodes This option can only be used in conjunction with the -n option (above). Use this option to specify launching less than 48 tasks per node. The job acquires nodes nodes, and total_tasks/nodes tasks are launched on each node.
--ntasks-per-xxx N/A The ntasks-per-core/socket/node options are not available on Lonestar5. The -N and -n options provide all the functionality needed for specifying a task layout on the nodes.
-t hh:mm:ss Wall clock time for job. Required.
--mail-user= email_address Specify the email address to use for notifications.
--mail-type= {begin, end, fail, all} Specify when user notifications are to be sent (one option per line)
-o output_file Direct job standard output to output_file (without -e option error goes to this file)
-e error_file Direct job error output to error_file
-d= afterok:jobid Specifies a dependency: this run will start only after the specified job (jobid) successfully finishes. NOTE: This option only works as a command-line option. Slurm will not process this directive within a job script.
-A projectnumber Charge job to the specified project/allocation number. This option is only necessary for logins associated with multiple projects.

Interactive sessions via idev

Request an interactive session to a compute node using TACC's idev utility. Especially useful for development and debugging.

TACC's idev provides interactive access to a node and captures the resulting batch environment which is automatically inherited by any addition terminal sessions that ssh to the node. idev is simple to use, using default resource options that avoid the required options of the Slurm "srun" command for interactive access.

In the sample session below, a user requests interactive access to a single node (default) for 15 minutes (default is 30) in the development queue (idev's default) in order to debug the myprog application. idev returns a compute node login prompt:

WINDOW1 login2$ idev -m 15
WINDOW1 ...  
WINDOW1 --> Sleeping for 7 seconds...OK  
WINDOW1 ...  
WINDOW1 --> Creating interactive terminal session (login) on master node nid00181.  
WINDOW1 ...  
WINDOW1 nid00181$ vim myprog.c
WINDOW1 nid00181$ make myprog

Now the user may open another window to run the newly-compiled application, while continuing to debug in the original terminal session:

WINDOW2 login2$ ssh -Y nid00181
WINDOW2 ...  
WINDOW2 nid00181$ ibrun ./myprog
WINDOW2 ...output  
WINDOW2 nid00181$

Use the "-h" switch to see more options:

login2$ idev -h

Interactive sessions via ssh

Users may also ssh to a compute node from a login node, but only when that user's batch job is running, or the user has an active interactive session, on that node. Once the batch job or interactive session ends, the user will no longer have access to that node.

In the following example session user slindsey submits a batch job (sbatch), queries which compute nodes the job is running on (squeue), then ssh's to one of the job's compute nodes. The displayed node list, nid0[1312-1333], nid01335 is truncated for brevity. Notice that the user attempts to ssh to compute node nid01334 which is NOT in the node list. Since that node is not assigned to this user for this job, the connection is refused. When the job terminates the connection will also be closed even if the user is still logged into the node.

User submits the job described in the "myjobscript" file:

login1$ sbatch myjobscript
Submitted batch job 5462435 

User polls the queue, waiting till the job runs (state "R") and nodes are assigned.

login1$ squeue -u slindsey

24144 normal    mympi slindsey R  0:39 16    nid0[1312-1333], nid01335 

If the user attempts to log on to an unassigned node, then the connection is denied.

login1$ ssh nid01334

Access denied: user slindsey (uid=804387) has no active jobs.
Connection closed by

User logs in to an assigned compute node and does work. Once the job has finished running, the connection is automatically closed.

login1$ ssh nid01333
nid01333 [665]~> do science; attach debuggers etc.
Connection to nid01333 closed by remote host.
Connection to nid01333 closed.

Parameter Sweeps and High Throughput Jobs

Parameter sweeps, where the same executable is run with multiple input data sets, as well as other high throughput scenarios, can be combined into a single job using TACC's launcher and pylauncher utilities.

The Launcher is a simple shell-based utility for bundling large numbers of independent, single process runs into one multi-node batch submission. This allows users to run more simultaneous serial jobs than is permitted directly through the serial queue, and improves turnaround time. See the sample Slurm launcher script above.

Please see TACC's Launcher documentation or the module help for more information.

login1$ module help launcher

Affinity and Memory Locality

Lonestar5 has Hyperthreading (HT) enabled. When HT is enabled the OS will address two virtual threads per code. These logical threads share core resources and thus may not improve performance for all workloads. In HT systems it is critical to pay attention to thread binding in multithreaded processes. We are interested in feedback regarding the use of 24 vs 48 threads/tasks per node, and willing to provide assistance setting up these tests.

HPC workloads often benefit from pinning processes to hardware instead of allowing the operating system to migrate them at will. This is particularly important in multicore and heterogeneous systems, where process (and thread) migration can lead to less than optimal memory access and resource sharing patterns, and thus a significant performance degradation. TACC provides an affinity script called tacc_affinity, to enforce strict local memory allocation and process pinning to the socket. For most HPC workloads, the use of tacc_affinity will ensure that processes do not migrate and memory accesses are local. To use tacc_affinity with your MPI executable, use this command:

nid00181$ ibrun tacc_affinity a.out

or place the command in a job script:

ibrun tacc_affinity a.out

This will apply an affinity for the tasks_per_socket option (or an appropriate affinity if tasks_per_socket is not used, and a memory policy that forces memory assignments to the local socket. Try ibrun with and without tacc_affinity to determine if your application runs better with TACC affinity setting.

However, there may be instances in which tacc_affinity is not flexible enough to meet the user's requirements. This section describes techniques to control process affinity and memory locality that can be used to improve execution performance in Lonestar5 and other HPC resources. In this section an MPI task is synonymous with a process. For a pure MPI-based job (i.e. no threading), it is strongly recommended to use 24 cores per node.

Do not use multiple methods to set affinity simultaneously as this can lead to unpredictable results.

Using numactl

numactl is a linux command that allows explicit control of process affinity and memory policy. Since each MPI task is launched as a separate process, numactl can be used to specify the affinity and memory policy for each task. There are two ways this can be used to exercise numa control when launching a batch executable:

nid00181$ ibrun numactl options ./a.out
nid00181$ ibrun my_affinity ./a.out

The first command sets the same options for each task. Because the ranks for the execution of each a.out are not known to numactl it is not possible to use this command-line to tailor options for each individual task. The second command launches an executable script, my_affinity, that sets affinity for each task. The script will have access to the number of tasks per node and the rank of each task, and so it is possible to set individual affinity options for each task using this method. In general any execution using more than one task should employ the second method to set affinity so that tasks can be properly pinned to the hardware.

In threaded applications, the same numactl command may be used, but its scope is limited globally to all threads, because every forked process or thread inherits the affinity and memory policy of the parent. This behavior can be modified from within a program using the numa API to control affinity. The basic calls for binding tasks and threads are "sched_getaffinity", "sched_setaffinity" and "numalib", respectively. Note, on the login nodes the core numbers for masking are assigned round-robin to the sockets (cores 0, 2, 4,… are on socket 0 and cores 1, 3, 5, … are on socket 1) while on the compute nodes they are assigned contiguously (cores 0-11 are on socket 0 and 12-23 are on socket 1).

The TACC provided affinity script, tacc_affinity, enforces a strict local memory allocation to the socket, forcing eviction of previous user's IO buffers, and also distributes tasks evenly across sockets. Use this script as a template for implementing your own affinity script if a custom affinity script is needed for your jobs.

Table 5. Common numactl Options

Option Arguments Description
-N 0,1 Socket Affinity. Execute process only on this (these) socket(s)
-C [0-23] Core Affinity. Execute process on this (these, comma separated list) core(s).
-l None Memory Policy. Allocate only on socket where process runs. Fallback to another if full.
-i 0,1 Memory Policy. Strictly allocate round robin on these (comma separated list) sockets. No fallback; abort if no more allocation space is available.
-m 0,1 Memory Policy. Strictly allocate on this (these, comma separated list) sockets. No fallback; abort if no more allocation space is available.
--preferred= 0,1 Memory Policy. Allocate on this socket. Fallback to the other if full.

Additional details on numactl are given in its man page and help information:

login1$ man numactl
login1$ numactl --help

Using Intel's KMP_AFFINITY

To alleviate the complexity of setting affinity in architectures that support multiple hardware threads per core Intel provides the means of controlling thread pinning via the environment variable $KMP_AFFINITY.

login1$ export KMP_AFFINITY=[<modifier>,...]type

Table 6. KMP_AFFINITY types

Option Description
none Does not pin threads.
compact Pack threads close to each other.
scatter Round-robin threads to cores.

KMP_AFFINITY type modifiers include:

  • norespect or respect (OS thread placement)
  • noverbose or verbose
  • nowarnings or warnings
  • granularity=[fine|core] where
    • fine - pinned to HW thread
    • core - able to jump between HW threads within the core

Managing and Monitoring Jobs

After job submission, users may monitor the status of their jobs in several ways. While the job is in the waiting state the system is continuously monitoring the number of nodes that become available and applying a fair share algorithm and a backfill algorithm to determine a fair, expedient scheduling to keep the machine running at optimum capacity. The latest queue information can be displayed several different ways using the showq and squeue commands.

Job Monitoring with showq

TACC's "showq" job monitoring command-line utility displays jobs in the batch system in a manner similar to PBS' utility of the same name. showq summarizes running, idle, and pending jobs, also showing any advanced reservations scheduled within the next week. See Table 7 for more showq options.

Note that the number of cores reported is always 48 x (Number of nodes), independently of how many cores are actually requested per node. The exceptions are the vis and gpu queues which will report 20 x (Number of nodes).

login1$ showq
ACTIVE JOBS--------------------
24820     plascomcm  oliver        Running 7680    20:38:50  Mon Dec 14 09:29:45
24827     noramp270_ mscott        Running 960      4:38:50  Mon Dec 14 09:29:45
24828     t13_job    mscott        Running 576      4:38:50  Mon Dec 14 09:29:45
24830     z_06_SL16  slindsey      Running 12000   20:38:50  Mon Dec 14 09:29:45
24856     z_07_SL16  slindsey      Running 8160    21:09:48  Mon Dec 14 10:00:43
24857     z_08_SL16  slindsey      Running 8160    21:20:55  Mon Dec 14 10:11:50
24863     idv88476   viennej       Running 768      3:39:56  Mon Dec 14 11:30:51
24864     terawrite  djames        Running 2400    23:59:42  Mon Dec 14 12:50:37

    8 active jobs

Total Jobs: 8     Active Jobs: 8     Idle Jobs: 0     Blocked Jobs: 0

Use the "-U" option with showq to display information for a single user:

login1$ showq -U slindsey


ACTIVE JOBS--------------------
28940     myjob1     slindsey      Running 480      6:44:22  Thu Jan  7 08:13:45
28941     myjob2     slindsey      Running 192      6:44:41  Thu Jan  7 08:14:04

Total Jobs: 2     Active Jobs: 2     Idle Jobs: 0     Blocked Jobs: 0
Option Description
--help display help message and exit
-l | --long display verbose (long) listing
-u | --user displays jobs for current user only
-U username displays jobs for username only

Job Monitoring with squeue

Both the showq -U and squeue -u username commands display similar information:

login1$ squeue -u slindsey

28941      normal   myjob1 slindsey  R    3:16:37      4 nid0[1112-1115]
28940      normal   myjob2 slindsey  R    3:16:56     10 nid0[1102-1111]

Each command's output lists the three jobs (1676351, 1676352 & 1676354) waiting to run. The showq command displays cores and time requested, while the squeue command displays the partition (queue), the state (ST) of the job along with the node list when allocated. In this case, all three jobs are in the Pending (PD) state awaiting "Resources", (nodes to free up). Table 8 details common squeue options and Table 9 describes the command's output fields.

Option Result
-i interval Repeatedly report at intervals (in seconds).
-j job_list Displays information for specified job(s)
-p part_list Displays information for specified partitions (queues).
-t state_list Shows jobs in the specified state(s)
See the squeue man page for state abbreviations: "all" or list of {PD,R,S,CG,CD,CF,CA,F,TO,PR,NF}

Field Description
JOBID job id assigned to the job
USER user that owns the job
STATE current job status, including, but not limited to:
CD (completed)
CF (cancelled)
F (failed)
PD (pending)
R (running)

Using the squeue command with the --start and -j options can provide an estimate of when a particular job will be scheduled:

login1$ squeue --start -j 1676354
1676534  normal hellow  user3PD2013-08-21T13:42:03  256 (Resources)

Even more extensive job information can be found using the "scontrol" command. The output shows quite a bit about the job: job dependencies, submission time, number of codes, location of the job script and the working directory, etc. See the man page for more details.

Job Deletion with scancel

The scancel command is used to remove pending and running jobs from the queue. Include a space-separated list of job IDs that you want to cancel on the command-line:

login1$ scancel job_id1 job_id2 ...

Use "showq -u" or "squeue -u username" to see your jobs.

Example job scripts are available online in /share/doc/slurm . They include details for launching large jobs, running multiple executables with different MPI stacks, executing hybrid applications, and other operations.

About Pending Jobs

Viewing queue status may reveal jobs in a pending (PD) state. Jobs submitted to Slurm may be, and remain, in a pending state for many reasons such as:

  • A queue (partition) may be temporarily offline
  • The resources (number of nodes) requested exceed those available
  • Queues are being drained in anticipation of system maintenance.
  • The system is running other high priority jobs

The Reason Codes summarized below identify the reason a job is awaiting execution. If a job is pending for multiple reasons, only one of those reasons is displayed. For a full list, view the squeue man page.

Job Pending Codes Description
Dependency This job is waiting for a dependent job to complete.
NodeDown A node required by the job is down.
PartitionDown The partition (queue) required by this job is in a DOWN state and temporarily accepting no jobs, for instance because of maintenance. Note that this message may be displayed for a time even after the system is back up.
Priority One or more higher priority jobs exist for this partition or advanced reservation. Other jobs in the queue have higher priority than yours.
ReqNodeNotAvail No nodes can be found satisfying your limits, for instance because maintenance is scheduled and the job can not finish before it
Reservation The job is waiting for its advanced reservation to become available.
Resources The job is waiting for resources (nodes) to become available and will run when Slurm finds enough free nodes.
SystemFailure Failure of the Slurm system, a file system, the network, etc.

Slurm Environment Variables

In addition to the environment variables that can be inherited by the job from the interactive login environment, Slurm provides environment variables for most of the values used in the #SBATCH directives. These are listed at the end of the sbatch man page. The environment variables SLURM_JOB_ID, SLURM_JOB_NAME, SLURM_SUBMIT_DIR and SLURM_NTASKS_PER_NODE may be useful for documenting run information in job scripts and output. Table 11 below lists some important Slurm-provided environment variables.

Note that environment variables cannot be used in an #SBATCH directive within a job script. For example, the following directive will NOT work as expected:


Instead, use the following directive:

#SBATCH -o myMPI.o%j

where "%j" expands to the jobID.

Environment Variable Description
SLURM_JOB_ID batch job id assigned by Slurm upon submission
SLURM_JOB_NAME user-assigned job name
SLURM_NNODES number of nodes
SLURM_NODELIST list of nodes
SLURM_NTASKS total number of tasks
SLURM_QUEUE queue (partition)
SLURM_SUBMIT_DIR directory of submission
SLURM_TASKS_PER_NODE number of tasks per node
SLURM_TACC_ACCOUNT TACC project/allocation charged

Job Dependencies

Some workflows may have job dependencies, for example a user may wish to perform post-processing on the output of another job, or a very large job may have to be broken up into smaller pieces so as not to exceed maximum queue runtime. In such cases you may use Slurm's command-line "--dependency=" options. Slurm will not process this option within a job script.

The following command submits a job script that will run only upon successful completion of another previously submitted job:

login1$ sbatch --dependency=afterok:jobid job_script_name

Monitor queue status with sinfo

The "sinfo" command gives a wealth of information about the status of the queues, but the command without arguments might give you more information than you want. Use the print options in in the snippet below with sinfofor a more readable listing that summarizes each queue on a single line:

login1$ sinfo -o "%20P %5a %.10l %16F"

The column labeled "NODES(A/I/O/T)" of this summary listing displays the number of nodes with the Allocated, Idle, and Offline states along with the Total node count for the partition. See "man sinfo" for more information.

Building your Applications

This section discusses the steps necessary to compile and/or re-build your applications.

Compiling your Applications

The default programming environment is based on the Intel compiler and the Cray Message Passing Toolkit, sometimes referred to as Cray MPICH. For compiling MPI codes, the familiar commands "mpicc", "mpicxx", "mpif90" and "mpif77" are available. Also, the compilers "icc", "icpc", and "ifort" are directly accessible.

To access the most recent versions of GCC, load one of the gcc modules.

Serial MPI Parallel
Language Intel GNU GCC Intel GNU GCC
C icc gcc mpicc mpicc
C++ icpc g++ mpicxx mpicxx
Fortran ifort gfortran mpif90 mpif90

Compiling OpenMP Applications

For pure OpenMP jobs, or other jobs that use threads but not MPI, specify a single node using the "-N" option and a single task using the "-n" option. Then, set the $OMP_NUM_THREADS environment variable to the desired number of threads. As usual with threaded jobs you may also wish to set an affinity using the $KMP_AFFINITY variable. See Using Intel's KMP_AFFINITY section.

Linking your Applications

Some of the more useful load flags/options for the host environment are listed below. For a more comprehensive list, consult the ld man page.

  • Use the "-l" loader option to link in a library at load time. This links in either the shared library "libname.so" (default) or the static library "libname.a", provided the library can be found in ldd's library search path or the $LD_LIBRARY_PATH environment variable paths.

    login1$ ifort prog.f90 -lname
  • To explicitly include a library directory, use the "-L" option:

    login1$ ifort prog.f -L/mydirectory/lib -lname

    In the above example, the user's libname.a library is not in the default search path, so the "-L" option is specified to point to the directory containing libname.a (only the library name is supplied in the "-l" argument; remove the "lib" prefix and the ".a" suffix.)

Many of the modules for applications and libraries, such as the hdf5 library module provide environment variables for compiling and linking commands. Execute "module help module_name" command for a description, listing and use cases for the assigned environment variables. The following example illustrates their use for the hdf5 library:

login1$ icc -I$TACC_HDF5_INC hdf5_test.c -o hdf5_test \
    -Wl,-rpath,$TACC_HDF5_LIB -L$TACC_HDF5_LIB -lhdf5 -lz

Here, the module supplied environment variables $TACC_HDF5_LIB and $TACC_HDF5_INC contain the hdf5 library and header library directory paths, respectively. The loader option "-Wl,-rpath" specifies that the $TACC_HDF5_LIB directory should be included in the binary executable. This allows the run-time dynamic loader to determine the location of shared libraries directly from the executable instead of the $LD_LIBRARY_PATHor the LDD dynamic cache of bindings between shared libraries and directory paths. This avoids having to set the $LD_LIBRARY_PATH (manually or through a module command) before running the executables. (This simple load sequence will work for some of the sequential MKL functions; see MKL Library section for using various packages within MKL.)

You can view the full path of the dynamic libraries inserted into your binary with the ldd command. The example below shows a partial listing for the h5stat binary:

login1$ ldd h5stat
libhdf5.so.10 => /opt/apps/intel16/hdf5/1.8.16/x86_64/lib/libhdf5.so.10 (0x00002b594a8b0000)
libsz.so.2 => /opt/apps/intel16/hdf5/1.8.16/x86_64/lib/libsz.so.2 (0x00002b594b0b9000)
libz.so.1 => /lib64/libz.so.1 (0x00002b594b2d6000)

Intel Math Kernel Library (MKL)

The Intel Math Kernel Library (MKL) is a collection of highly optimized functions implementing some of the most important mathematical kernels used in computational science, including standardized interfaces to:

  • BLAS (Basic Linear Algebra Subroutines), a collection of low-level matrix and vector operations like matrix-matrix multiplication
  • LAPACK (Linear Algebra PACKage), which includes higher-level linear algebra algorithms like Gaussian Elimination
  • FFT (Fast Fourier Transform), including interfaces based on FFTW (Fastest Fourier Transform in the West)
  • ScaLAPACK (Scalable LAPACK), BLACS (Basic Linear Algebra Communication Subprograms), Cluster FFT, and other functionality that provide block-based distributed memory (multi-node) versions of selected LAPACK, BLAS, and FFT algorithms;
  • Vector Mathematics (VM) functions that implement highly optimized and vectorized versions of special functions like sine and square root.

MKL with Intel C, C++, and Fortran Compilers

There is no MKL module for the Intel compilers because you don't need one: the Intel compilers have built-in support for MKL. Unless you have specialized needs, there is no need to specify include paths and libraries explicitly. Instead, using MKL with the Intel modules requires nothing more than compiling and linking with the "-mkl" option.; e.g.

$ icc   -mkl mycode.c
$ ifort -mkl mycode.c

The "-mkl" switch is an abbreviated form of "-mkl=parallel", which links your code to the threaded version of MKL. To link to the unthreaded version, use "-mkl=sequential". A third option, "-mkl=cluster", which also links to the unthreaded libraries, is necessary and appropriate only when using ScaLAPACK or other distributed memory packages. For additional information, including advanced linking options, see the MKL documentation and Intel MKL Link Line Advisor.

MKL with GNU C, C++, and Fortran Compilers

When using a GNU compiler, load the MKL module before compiling or running your code, then specify explicitly the MKL libraries, library paths, and include paths your application needs. Consult the Intel MKL Link Line Advisor for details. A typical compile/link process on a TACC system will look like this:

$ module load gcc
$ module load mkl                         # available/needed only for GNU compilers
$ gcc -fopenmp -I$MKLROOT/include         \
         -Wl,-L${MKLROOT}/lib/intel64     \
         -lmkl_intel_lp64 -lmkl_core      \
         -lmkl_gnu_thread -lpthread       \
         -lm -ldl mycode.c

For your convenience the mkl module file also provides alternative TACC-defined variables like $TACC_MKL_INCLUDE (equivalent to $MKLROOT/include). Execute "module help mkl" for more information.

Using MKL as BLAS/LAPACK with Third-Party Software

When your third-party software requires BLAS or LAPACK, you can use MKL to supply this functionality. Replace generic instructions that include link options like "-lblas" or "-llapack" with the simpler MKL approach described above. There is no need to download and install alternatives like OpenBLAS.

Using MKL as BLAS/LAPACK with TACC's MATLAB, Python, and R Modules

TACC's MATLAB, Python, and R modules all use threaded (parallel) MKL as their underlying BLAS/LAPACK library. These means that even serial codes written in MATLAB, Python, or R may benefit from MKL's thread-based parallelism. This requires no action on your part other than specifying an appropriate max thread count for MKL; see the section below for more information.

Controlling Threading in MKL

Any code that calls MKL functions can potentially benefit from MKL's thread-based parallelism; this is true even if your code is not otherwise a parallel application. If you are linking to the threaded MKL (using "-mkl", "-mkl=parallel", or the equivalent explicit link line), you need only specify an appropriate value for the max number of threads available to MKL. You can do this with either of the two environment variables MKL_NUM_THREADS or OMP_NUM_THREADS. The environment variable MKL_NUM_THREADS specifies the max number of threads available to each instance of MKL, and has no effect on non-MKL code. If MKL_NUM_THREADS is undefined, MKL uses OMP_NUM_THREADS to determine the max number of threads available to MKL functions. In either case, MKL will attempt to choose an optimal thread count less than or equal to the specified value. Note that OMP_NUM_THREADS defaults to 1 on TACC systems; if you use the default value you will get no thread-based parallelism from MKL.

If you are running a single serial, unthreaded application (or an unthreaded MPI code involving a single MPI task per node) it is usually best to give MKL as much flexibility as possible by setting the max thread count to the total number of hardware threads on the node (48 on the typical Haswell LS5 compute node). Of course things are more complicated if you are running more than one process on a node: e.g. multiple serial processes, threaded applications, hybrid MPI-threaded applications, or pure MPI codes running more than one MPI rank per node. See http://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications and related Intel resources for examples of how to manage threading when calling MKL from multiple processes.

Software on Lonestar5

Use TACC's Software Search tool or the "module spider" command to discover available software packages.

Users are welcome to install packages in their Home or Work directories. No super-user privileges are needed, simply use the "--prefix" option when configuring then making the package.

Users must provide their own license for commercial packages.


Lonestar5 supports both interactive and batch visualization on any compute node using the Software-Defined Visualization stack (sdvis.org). Traditional hardware-accelerated rendering is available on the 16 GPU nodes through the vis queue, where each node is configured with one NVIDIA K40s GPU.

Remote Desktop Access

Remote desktop access to Lonestar5 is provided through a virtual network connection (VNC) to one or more nodes. Users must first connect to a Lonestar login node (see Accessing Lonestar5 and from there submit a job that:

  • allocates a set of Lonestar nodes
  • starts a vncserver process on the first allocated node
  • identifies via an output message a vncserver access port to connect to

Once the vncserver process is running, the viewer establishes a secure SSH tunnel to the specified vncserver access port, and starts a VNC viewer application on their local system which presents the virtual desktop to the user.

Note: If this is your first time connecting to LS5, you must run vncpasswd to create a password for your VNC servers. This should NOT be your login password! This mechanism only deters unauthorized connections; it is not fully secure, as only the first eight characters of the password are saved. All VNC connections are tunneled through SSH for extra security, as described below.

Follow the steps below to start an interactive session.

  1. Start a Remote Desktop

    TACC has provided a VNC Slurm job script (/share/doc/slurm/job.vnc) that requests one node in the vis queue for four hours, creating a VNC session. Submit this job with the sbatch command:

    login1$ sbatch /share/doc/slurm/job.vnc

    You may modify or overwrite script defaults with sbatch command-line options:

    • "-t hours:minutes:seconds" - modify the job runtime
    • "-A projectnumber" - specify the project/allocation to be charged
    • "-N nodes" - specify number of nodes needed
    • "-n processes" - specify the number of processes per node NEW to LS5
    • "-p partition" - specify alternate queue (default queue is normal)

    All arguments after the job script name are sent to the vncserver command. For example, to set the desktop resolution to 1440x900, use:

    login1$ sbatch /share/doc/slurm/job.vnc -geometry 1440x900

    The vnc.job script starts a vncserver process and writes to the output file, vncserver.out in the job submission directory, with the connect port for the vncviewer. Watch for the "To connect via VNC client" message at the end of the output file, or watch the output stream in a separate window with the commands:

    login1$ touch vncserver.out ; tail -f vncserver.out

    The spartan window manager twm is the default VNC desktop. The lightweight window manager, icewm, is recommended for remote performance. To use icewm, open the "~/.vnc/xstartup" file (created after your first VNC session) and replace "twm" with "starticewm".

  2. Create an SSH Tunnel to Lonestar5

    TACC requires users to create a secure SSH tunnel from their local system to either one of two LS5 login nodes: login1.ls5.utexas.edu or login2.ls5.utexas.edu. On Unix or Linux systems execute the following command once the port has been opened on an LS5 login node:

    localhost$ ssh -f -N -L xxxx:ls5.tacc.utexas.edu:yyyy username@login1.ls5.tacc.utexas.edu


    localhost$ ssh -f -N -L xxxx:ls5.tacc.utexas.edu:yyyy username@login2.ls5.tacc.utexas.edu


    • "yyyy" is the port number given by the vncserver batch job
    • "xxxx" is a port on the remote system. Generally, the port number specified on one of the LS5 login nodes, yyyy, is a good choice to use on your local system as well
    • "-f" instructs SSH to only forward ports, not to execute a remote command
    • "-N" backgrounds the ssh command after connecting
    • "-L" forwards the port

    On Windows systems, find the menu in the SSH client where tunnels can be specified, and enter the local and remote ports as required, then ssh to LS5.

  3. Connecting vncviewer

    Once the SSH tunnel has been established, use a VNC client to connect to the local port you created, which will then be tunneled to your VNC server on Lonestar. Connect to "localhost:xxxx", where "xxxx" is the local port you used for your tunnel. In the examples above, we would connect the VNC client to "localhost::xxxx" (note the "::", some VNC clients accept "localhost:xxxx").

    TACC staff recommends the following VNC clients:

    • TigerVNC VNC Client, a platform independent application
    • TightVNC for Windows and Linux
    • Chicken of the VNC for Mac

    Before the desktop is presented, the user will be prompted for their VNC server password (the password created before your first session using vncpasswd as explained above). Depending on your local system this prompt's location may or may not be obvious. If you don't see it immediately take a good look around your desktop. The virtual desktop should appear at this point and includes one or two initial xterm windows (which may be overlapping). One, which is white-on-black, manages the lifetime of the VNC server process. Killing this window (by typing "exit" or "ctrl-D" at the prompt, or selecting the "X" in the upper corner) will cause the vncserver process to terminate and the original batch job to end. Because of this, we recommend that this window not be used for other purposes; it is just too easy to accidentally kill it and terminate the session. Move it off to one side out of the way.

    The other xterm window is black-on-white, and can be used to start both serial programs running on the node hosting the vncserver process, or parallel jobs running across the set of cores associated with the original batch job. Additional xterm windows can be created using the window-manager left-button menu.

Running Applications on the VNC Desktop

From an interactive desktop, applications can be run from icons or from xterm command prompts. Two special cases arise: running parallel applications, and running applications that use OpenGL.

Running Parallel Applications from the Desktop

Parallel applications are run on the desktop using the same ibrun wrapper described above (see Running). The command:

c442-001$ ibrun ibrunoptions application applicationoptions

will run application on the associated nodes, as modified by the ibrun options.

Running OpenGL/X Applications On The Desktop

Lonestar5 uses the OpenSWR OpenGL library to perform efficient rendering. At present, the compute nodes on Lonestar5 do not support native X instances. All windowing environments should use a VNC desktop launched via the job script in /share/doc/slurm/job.vnc or using the TACC Vis portal.

swr: To access the accelerated OpenSWR OpenGL library, it is necessary to use the swr module to point to the swr OpenGL implementation and configure the number of threads to allocate to rendering.

c442-001$ module load swr
c442-001$ swr options application application-args

Parallel VisIt on Lonestar5

VisIt was compiled under the Intel compiler and the cray-mpich stacks.

After connecting to a VNC server on Lonestar5, as described above, load the VisIt module at the beginning of your interactive session before launching the Visit application:

c442-001$ module load swr visit
c442-001$ swr visit

VisIt first loads a dataset and presents a dialog allowing for selecting either a serial or parallel engine. Select the parallel engine. Note that this dialog will also present options for the number of processes to start and the number of nodes to use; these options are actually ignored in favor of the options specified when the VNC server job was started.

Preparing data for Parallel Visit

In order to take advantage of parallel processing, VisIt input data must be partitioned and distributed across the cooperating processes. This requires that the input data be explicitly partitioned into independent subsets at the time it is input to VisIt. VisIt supports SILO data, which incorporates a parallel, partitioned representation. Otherwise, VisIt supports a metadata file (with a .visit extension) that lists multiple data files of any supported format that are to be associated into a single logical dataset. In addition, VisIt supports a "brick of values" format, also using the .visit metadata file, which enables single files containing data defined on rectilinear grids to be partitioned and imported in parallel. Note that VisIt does not support VTK parallel XML formats (.pvti, .pvtu, .pvtr, .pvtp, and .pvts). For more information on importing data into VisIt, see Getting Data Into VisIt though this documentation refers to VisIt version 2.0, it appears to be the most current available.

Parallel ParaView on Lonestar5

After connecting to a VNC server on Lonestar5, as described above, do the following:

  1. Set up your environment with the necessary modules. Load the swr, qt5, ospray, and paraview modules in this order:

    c442-001$ module load swr qt5 ospray paraview
  2. Launch ParaView:

     c442-001$ swr -p 1 paraview [paraview client options]
    1. Click the "Connect" button, or select File -> Connect

    2. Select the "auto" configuration, then press "Connect". In the Paraview Output Messages window, you'll see what appears to be an ‘lmod' error, but can be ignored. Then you'll see the parallel servers being spawned and the connection established.

Assessing the Need

The large memory nodes are for jobs that require more than the 64GB of RAM available on the standard memory nodes. Demand for these nodes is high and wait times can be long. Please assess your memory needs carefully before submitting jobs to the large memory queues: in many cases there are ways to resolve memory issues that do not require moving to large memory nodes. For MPI applications, it is often enough to run with fewer MPI tasks per node. For example, instead of "-N 4 -n 96" (96 tasks spread across 4 nodes, or 24 tasks per node), one can increase the memory available to each task by running "-N 8 -n 96" (96 tasks spread across 8 nodes, or 12 tasks per node) or even "-N 96 -n 96" (1 task per node). TACC's Remora tool allows you to examine your memory needs easily. For more information about remora:

login1$ module load remora; module help remora

Accessing the Large Memory Queues

The TACC-largemem module provides access to the large memory queues. It is the large memory equivalent of the (default) "TACC" module that manages access to the standard memory queues. To configure your environment for the large memory queues, execute:

login1$ module load TACC-largemem

to automatically swap out the TACC module. Loading the TACC-largemem module has two effects:

  1. It configures Slurm so it points to the two large memory queues (and only those queues)
  2. It swaps out any Aries-based MPI module and automatically loads the impi-largemem module, an MPI stack that is compatible with the Infiniband network connecting the large memory nodes.

To reconfigure your environment for the standard memory queues, execute

 login1$ module load TACC

to automatically swap out the TACC-largemem module and replace it with the default TACC module. You can of course achieve the same outcome in other ways (e.g. executing "module reset" to return to system defaults). Do not, however, simply unload the TACC-largemem module; doing so would have the effect of leaving you without access to any Slurm commands.

MPI Applications on the Large Memory Queues

To run an MPI application in either of the large memory queues you will need to rebuild the application using the impi-largemem module. This module provides an MPI stack compatible with the Infiniband network on the large memory queues. The "cray_mpich" module targets the Aries network and will not work in the large memory queues.

Building for the 512GB Haswell Nodes

The login nodes and 512GB compute nodes share the same Haswell architecture. Be sure to load the impi-largemem when building MPI applications for the large memory nodes. Beyond this requirement, there are no other special considerations when using the login nodes to build software for the largemem512GB nodes.

Building for the 1TB Ivy Bridge Nodes

The Ivy Bridge 1TB compute nodes have a different processor architecture than the Haswell login nodes. This can affect the way you use the Intel compiler and Haswell login nodes to build software targeting the 1TB Ivy Bridge nodes. Among the plausible approaches:

  • Single Binary Optimized for Both Architectures (recommended): Specify "-xAVX -axCORE-AVX2" with the Intel compiler to build a multi-architecture binary that will detect the processor architecture at runtime and select processor-specific optimized code. In a typical build system, add these flags to the CFLAGS, CXXFLAGS, FFLAGS, and LDFLAGS variables. The resulting binary may be up to 2x larger than a single-target executable, and compilation will take more time. But this approach is otherwise an excellent choice for codes you want to run in both large memory queues.

  • Generic Binary (supported): If you compile with no architecture flags (e.g. "icc mycode.c"), the compiler defaults to "-msse2", producing a binary that will run on essentially any modern Intel processor. The executable, however, will not exploit new hardware features or the best processor-specific optimizations.

When building MPI applications, be sure to load the "impi-largemem" module.

Running Jobs in the Large Memory Queues

Submitting a batch job to a large memory queue is similar to doing so on the standard queues. Interactive sessions using idev are also possible, but queue wait times may make interactive sessions impractical. To submit a batch job or request an interactive session on a large memory queue:

  1. First load the TACC-largemem module so that Slurm points to the large memory queues. You must load this module before executing your sbatch or idev command.

  2. For MPI jobs, check to make sure you have impi-largemem module loaded. Loading the TACC-largemem module automatically loads impi-largemem module. You can also load the module explicitly before calling sbatch or idev, in your job script itself, or from within your interactive idev session.

  3. Specify the appropriate queue in one of three ways:

    • in a batch script:

      #SBATCH -p largemem512GB
    • as a command-line option to the sbatch command:

      login1$ sbatch -p largemem512GB mybatchscript
    • or as an idev option

      login1$ idev -p largemem512GB

TACC and Cray Environments

The Lonestar5 environment actually includes two distinct modes: a TACC Environment and a Cray Environment. This user guide describes the TACC Environment, designed to make available to you a user experience similar to other TACC resources and a software stack built and maintained by the TACC staff. The Cray Environment is designed to make available to you a user experience similar to other Cray systems and the standard software stack provided by the vendor.

Help Desk

TACC Consulting operates from 8am to 5pm CST, Monday through Friday, except for holidays. You can submit a help desk ticket at any time via the TACC User Portal with "Lonestar5" in the Resource field. Help the consulting staff help you by following these best practices when submitting tickets.

  • Do your homework before submitting a help desk ticket. What does the user guide and other documentation say? Search the internet for key phrases in your error logs; that's probably what the consultants answering your ticket are going to do. What have you changed since the last time your job succeeded?

  • Describe your issue as precisely and completely as you can: what you did, what happened, verbatim error messages, other meaningful output. When appropriate, include the information a consultant would need to find your artifacts and understand your workflow: e.g. the directory containing your build and/or job script; the modules you were using; relevant job numbers; and recent changes in your workflow that could affect or explain the behavior you're observing.

  • Subscribe to Lonestar5 User News. This is the best way to keep abreast of maintenance schedules, system outages, and other general interest items.

  • Have realistic expectations. Consultants can address system issues and answer questions about Lonestar5. But they can't teach parallel programming in a ticket, and may know nothing about the package you downloaded. They may offer general advice that will help you build, debug, optimize, or modify your code, but you shouldn't expect them to do these things for you.

  • Be patient. It may take a business day for a consultant to get back to you, especially if your issue is complex. It might take an exchange or two before you and the consultant are on the same page. If the admins disable your account, it's not punitive. When the file system is in danger of crashing, or a login node hangs, they don't have time to notify you before taking action.