Maverick User Guide
Maverick, an HP/NVIDIA Interactive Visualization and Data Analytics System, is TACC's latest addition to its suite of advanced computing systems, combines capacities for interactive advanced visualization and large-scale data analytics as well as traditional high performance computing. Recent exponential increases in the size and quantity of digital datasets necessitate new systems such as Maverick, capable of fast data movement and advanced statistical analysis. Maverick debuts the new NVIDIA K40 GPU for remote visualization and GPU computing to the national community.
- 132 nodes with two Intel Xeon E5-2680 v2 Ivy Bridge sockets with 10 CPU cores per socket
- 132 NVIDIA Tesla K40 GPUs
- TACC-developed remote vis software: ScoreVIS, DisplayCluster, GLuRay
- Visualization software stack: Paraview, VisIT, EnSight, Amira
- 132 1/4TB memory nodes
- connected to 20PB file system
- Mellanox FDR InfiniBand interconnect
- comprehensive software includes: MATLAB, Parallel R
Maverick is intended primarily for interactive visualization and data analysis jobs to allow for interactive query of large-scale data sets. Normal batch queues will enable users to run simulations up to 6 hours for interactive jobs and 24 hours for GPGPU and HPC jobs. Jobs requiring run times and more cores than allowed by the normal queues will be run in a special queue after approval of TACC staff. Users will be able to run jobs using 132 of the NVIDIA Tesla K40s for both interactive graphics and for GPGPU jobs (at a lower priority).
The new Maverick HP/NVIDIA Interactive Visualization and Data Analytics System is configured with 132 HP ProLiant SL250s Gen8 compute nodes and 132 NVIDIA Tesla K40 GPU accelerators. In addition, with 256 GB of memory and 500 GB of storage per node, users have access to an aggregate of 33.7 TB of memory and 66 TB of local storage. Compute nodes have access to a 20 PB Lustre Parallel file system, Stockyard. An FDR InfiniBand switch fabric interconnects the nodes facilitating high-speed internode communication and I/O traffic.
- Operating System - CentOS 6.4
- CPU - Intel Xeon E5-2680 v2 Ivy Bridge, 2.80 GHz, 20 CPUs/node, 12.8 GB memory / core
Maverick has several different file systems with distinct storage characteristics. There are predefined directories in these file systems for you to store your data. Since these file systems are shared with others, they are managed either by quota limits. There is no purge policy on Maverick.
Two local file systems are available: an NFS
$WORK, a Lustre filesystem on the TACC backbone Stockyard. The
$HOME directory has a 10GB quota. All file systems also impose an inode limit, which affects the number of files allowed.
$WORK filesystem on Maverick is shared with Stampede, though a user's
$WORK directory path on Stampede will differ from that on Maverick. For example a user's
$WORK directory on Stampede will have the path format similar to: "
/work/01158/janeuser", and on Maverick similar to: "
Maverick is accessed either using the secure-shell
ssh program (for batch-mode access, but which can be used to initiate interactive VNC access) or via the TACC Visualization Portal (formerly Longhorn Visualization Portal).
Unix-based systems, including Linux and Mac OS X have an
ssh client available; freely available clients are also available; a popular choice for Windows is PuTTY.
login1$ ssh firstname.lastname@example.org
where username is replaced with the Maverick user name assigned to you during the allocation process.
Maverick does NOT have a local parallel filesystem or additional nodes to run GridFTP services as on Ranch, Stampede, or Lonestar. Maverick shares the TACC backbone Stockyard's large
$WORK parallel file system (1TB quota) with Stampede. Since users' Stampede and Maverick
$WORK filesystems are NOT in the same location on Stockyard, users may transfer files with
globus-url-copy to Maverick's
$WORK filesystem by using Stampede's GridFTP endpoint (
gridftp.stampede.tacc.xsede.org) with your Maverick
$WORK directory path. For a full list of XSEDE endpoints please see XSEDE's Data Transfers & Management GridFTP Endpoints table.
The following Stampede session demonstrates Globus'
globus-url-copy to copy "
mybigfile" from PSC's Bridges to the user's Maverick
login1$ module load CTSSV4 login1$ myproxy-logon Enter MyProxy pass phrase: A credential has been received for user slindsey in /home1/01158/slindsey/.globus/userproxy.pem. login1$ globus-url-copy -stripe -tcp-bs 8388608 \ gsiftp://gridftp.bridges.psc.edu/scratcha/joeuser/mylargefile \ gsiftp://gridftp.stampede.tacc.xsede.org:2811/scratch/joeuser/mylargefile
Also, Maverick's small
$HOME file system is not available to Stampede's
gridftp servers. Users will need to employ the
scp mechanisms to copy files to their Maverick
Maverick employs the environment modules system to manage a user's environment. To see all the software that is available across all compilers and mpi stacks, issue:
login1% module spider
To see which software packages are available with your currently loaded compiler and mpi stack:
login1$ module avail
You may also consult the TACC Software page for a listing of all available software for all TACC resources.
In general, application development on Maverick is identical to that on Stampede, including the availability and usage of compilers, the parallel development libraries (e.g. MPI and OpenMP), tuning and debugging.
Additional visualization-oriented libraries available on Maverick are made accessible through the modules system. Library and include-file search path environment variables are modified when modules are loaded. For detailed information on the effect of loading a module, use:
login1$ module help modulename
Jobs are run on Maverick using one of two methods: Batch jobs can be submitted from the Maverick login node,
maverick.tacc.utexas.edu, and interactively from a remotely accessed VNC desktop running on an allocated Maverick compute node.
|Queue Name||Purpose||Max Runtime||Max Nodes/Procs||Max Jobs in Queue||Node Pool|
| ||Visualization||4 hrs||32 nodes (640 cores)||20||all compute nodes|
| ||GPU||12 hrs||32 nodes (640 cores)||20||all compute nodes except c225 rack|
| ||special request||-||-||-||all compute nodes|
Batch jobs are run on Maverick via the SLURM job scheduler. Please consult the Stampede User Guide's SLURM Batch Environment section for detailed information on the SLURM interface to batch job control.
The number of SUs billed depends on the total number of nodes used:
SUs billed = # nodes * 20 cores/node * wallclock time
The TACC Visualization Portal is available at https://vis.tacc.utexas.edu. It provides a very simple mechanism to run interactive sessions on Maverick. It presents two choices: to create a VNC desktop (essentially wrapping the above in a much simplified manner, though at cost of some flexibility), and the ability to run RStudio server and iPython Notebook sessions. Please see the TACC Visualization page for more information.
Pulldowns on this page enable a user choose either to create a Maverick VNC desktop or an RStudio Server, or an iPython Notebook session. When VNC is selected, the user is presented with pulldowns for setting the various parameters of a VNC session, including the wayness, number of nodes, and desktop dimensions. The portal will then submit a VNC job to the Maverick
vis queue. When the job starts, a VNC viewer will be established in in the portal; alternatively, the Jobs tab will present the a URL and port number that the can be used to connect an external VNC viewer. Note that the portal provides access to only some of the options available through the
qsub interface, and the previous method of creating a VNC session through the
qsub interface will be necessary in some cases.
The TACC Visualization Portal jobs page also shows the current usage of Maverick; it is a very easy mechanism to find the status of jobs. All jobs submitted to Maverick - either via
qsub or via the Portal, running or in various wait queues, will appear in the status information shown.
- Remote Desktop Access
- Running Applications on the VNC Desktop
- Running Parallel Applications from the Desktop
- Running OpenGL/X Applications On The Desktop
While batch visualization can be performed on any Maverick node, a set of nodes have been configured for hardware-accelerated rendering. The
vis queue contains a subset of 132 compute nodes configured with one NVIDIA K40 GPU each.
Remote desktop access to Maverick is formed through a VNC connection to one or more visualization nodes.
You must have an account on Maverick in order to start a VNC session. University of Texas faculty, staff and affiliates may request an account by submitting a help desk ticket through the TACC User Portal. XSEDE users may submit a help desk ticket via the XSEDE User Portal (XUP) Help Desk.
Users must first connect to a Maverick login node (see System Access) and submit a special interactive batch job that:
- allocates a set of Maverick visualization nodes
- starts a vncserver process on the first allocated node
- sets up a tunnel through the login node to the vncserver access port
Once the vncserver process is running on the visualization node and a tunnel through the login node is created, an output message identifies the access port for connecting a VNC viewer. A VNC viewer application is run on the user's remote system and presents the desktop to the user.
If this is your first time connecting to Maverick, you must run
vncpasswd to create a password for your VNC servers. This should NOT be your XSEDE login or Maverick password! This mechanism only deters unauthorized connections; it is not fully secure, as only the first eight characters of the password are saved. All VNC connections are tunnelled through SSH for extra security, as described below.
Follow the steps below to start an interactive session.
Start a Remote Desktop
TACC has provided a VNC job script (
/share/doc/slurm/job.vnc) that requests one node in the
visqueue for four hours, creating a VNC session.
login1$ sbatch /share/doc/slurm/job.vnc
You may modify or overwrite script defaults with
-t hours:minutes:seconds" modifies the job runtime
-A projectnumber" specifies the project to be charged
-N nodes" sets the number of nodes needed
-p partition" to specify an alternate partition (queue).
See moreAll arguments after the job script name are sent to the vncserver command. For example, to set the desktop resolution to 1440x900, use:
sbatchoptions in Stampede User Guide Table 7.3
login1$ sbatch /share/doc/slurm/job.vnc -geometry 1440x900The
vnc.jobscript starts a vncserver process and writes to the output file,
vncserver.outin the job submission directory, with the connect port for the vncviewer. Watch for the "To connect via VNC client" message at the end of the output file, or watch the output stream in a separate window with the commands:
login1$ touch vncserver.out ; tail -f vncserver.out
The lightweight window manager,
xfce, is the default VNC desktop and is recommended for remote performance. Gnome is available; to use gnome, open the "
~/.vnc/xstartup" file (created after your first VNC session) and replace "startxfce4" with "gnome-session". Note that gnome may lag over slow internet connections.
Create an SSH Tunnel to MaverickTACC requires users to create an SSH tunnel from the local system to the Maverick login node (
maverick.tacc.utexas.edu) to assure that the connection is secure. On a Unix or Linux system, execute the following command once the port has been opened on the Maverick login node:
localhost$ ssh -f -N -L xxxx:maverick.tacc.utexas.edu:yyyy email@example.com
- "yyyy" is the port number given by the vncserver batch job
- "xxxx" is a port on the remote system. Generally, the port number specified on the Maverick login node, yyyy, is a good choice to use on your local system as well
-f" instructs SSH to only forward ports, not to execute a remote command
-N" puts the
sshcommand into the background after connecting
-L" forwards the port
Once the SSH tunnel has been established, use a VNC client to connect to the local port you created, which will then be tunneled to your VNC server on Maverick. Connect to localhost:xxxx, where xxxx is the local port you used for your tunnel. In the examples above, we would connect the VNC client to localhost::xxxx. (Some VNC clients accept localhost:xxxx).
We recommend the TigerVNC VNC Client, a platform independent client/server application.
Once the desktop has been established, two initial xterm windows are presented (which may be overlapping). One, which is white-on-black, manages the lifetime of the VNC server process. Killing this window (typically by typing "
exit" or "
ctrl-D" at the prompt) will cause the vncserver to terminate and the original batch job to end. Because of this, we recommend that this window not be used for other purposes; it is just too easy to accidentally kill it and terminate the session.
The other xterm window is black-on-white, and can be used to start both serial programs running on the node hosting the vncserver process, or parallel jobs running across the set of cores associated with the original batch job. Additional xterm windows can be created using the window-manager left-button menu.
Parallel applications are run on the desktop using the
ibrun: Enables parallel MPI jobs to be started from the VNC desktop.
ibrun uses information from the user's environment to start MPI jobs across the user's set of Maverick compute nodes. This information is determined by the initial SLURM job submission, and includes the location of the hostfile created by SLURM (found in the
$PE_HOSTFILE environment variable).
c442-001$ ibrun [ibrun options] application [application options]
Running OpenGL/X applications on Maverick visualization nodes requires that the native X server be running on each participating visualization node. Like other TACC visualization servers, on Maverick the X servers are started automatically on each node.
Once native X servers are running, several scripts are provided to enable rendering in different scenarios.
vglrun: Because VNC does not support OpenGL applications, VirtualGL is used to intercept OpenGL/X commands issued by application code and re-direct it to a local native X display for rendering; rendered results are then automatically read back and sent to VNC as pixel buffers. To run an OpenGL/X application from a VNC desktop command prompt:
c442-0011$ vglrun [vglrun options] application [application-args]
tacc_xrun: Some visualization applications present a client/server architecture, in which every process of a parallel server renders to local graphics resources, then returns rendered pixels to a separate, possibly remote client process for display. By wrapping server processes in the
$DISPLAYenvironment variable is manipulated to share the rendering load across the two GPUs available on each node. For example,
c442-001$ ibrun tacc_xrun application application-argswill cause the tasks to utilize each node, but will not render to any VNC desktop windows.
tacc_vglrun: Other visualization applications incorporate the final display function in the root process of the parallel application. This case is much like the one described above except for the root node, which must use
vglrunto return rendered pixels to the VNC desktop. For example,
c442-001$ ibrun tacc_vglrun application application-argswill cause the tasks to utilize the GPU for rendering, but will transfer the root process' graphics results to the VNC desktop.
Maverick provides a set of visualization-specific modules listed below.:
VisIt is a free interactive parallel visualization and graphical analysis tool for viewing scientific data on Unix and PC platforms. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images for presentations. VisIt contains a rich set of visualization features so that you can view your data in a variety of ways. It can be used to visualize scalar and vector fields defined on two- and three-dimensional (2D and 3D) structured and unstructured meshes. VisIt was designed to handle very large data set sizes in the terascale range and yet can also handle small data sets in the kilobyte range.
VisIt was compiled under the Intel compiler and the mvapich2 and MPI stacks.After connecting to a VNC server on Stampede, as described above, load the VisIt module at the beginning of your interactive session before launching the Visit application:
c221-102$ module load visit c221-102$ vglrun visit
VisIt first loads a dataset and presents a dialog allowing for selecting either a serial or parallel engine. Select the parallel engine. Note that this dialog will also present options for the number of processes to start and the number of nodes to use; these options are actually ignored in favor of the options specified when the VNC server job was started.
In order to take advantage of parallel processing, VisIt input data must be partitioned and distributed across the cooperating processes. This requires that the input data be explicitly partitioned into independent subsets at the time it is input to VisIt. VisIt supports SILO data, which incorporates a parallel, partitioned representation. Otherwise, VisIt supports a metadata file (with a
.visit extension) that lists multiple data files of any supported format that are to be associated into a single logical dataset. In addition, VisIt supports a "brick of values" format, also using the
.visit metadata file, which enables single files containing data defined on rectilinear grids to be partitioned and imported in parallel. Note that VisIt does not support VTK parallel XML formats (
.pvts). For more information on importing data into VisIt, see Getting Data Into VisIt; though this documentation refers to VisIt version 2.0, it appears to be the most current available.
After connecting to a VNC server on Stampede, as described above, do the following:
$NO_HOSTSORTenvironment variable to 1
login1% setenv NO_HOSTSORT 1
login1$ export NO_HOSTSORT=1
Set up your environment with the necessary modules:
If the user is intending to use the Python interface to Paraview via any of the following methods:
- the Python scripting tool available through the ParaView GUI
- loading the
paraview.simplemodule into python
paraviewmodules in this order:
c221-102$ module load python qt paraviewelse just load the
paraviewmodules in this order:
c221-102$ module load qt paraview
Note that the
qtmodule is always required and must be loaded prior to the
- Launch ParaView:
c221-102$ vglrun paraview [paraview client options]
- Click the "Connect" button, or select File -> Connect
If this is the first time you've used ParaView in parallel (or failed to save your connection configuration in your prior runs):
- Select "Add Server"
- Enter a "Name" e.g. "ibrun"
- Click "Configure"
- For "Startup Type" in the configuration dialog, select "Command" and enter the command:
c221-102$ ibrun tacc_xrun pvserver [paraview server options]and click "Save"
- Select the name of your server configuration, and click "Connect"
You will see the parallel servers being spawned and the connection established in the ParaView Output Messages window.
IDL (Interactive Data Language) is a popular interpreted language for data processing and analysis. IDL includes:
- A rich library of high-performance, multi-threaded routines to analyze your data
- The ability to add your own specialized routines to the library by writing procedures more quickly than other languages
- Simple syntax, dynamic data typing, and array-oriented operations
- Built-in functionality suitable for many data trends, with tools for 2- and 3-dimensional gridding and interpolation, routines for curve and surface fitting, and the ability to perform multi-threaded computations
To run IDL interactively in a VNC session, connect to a VNC server on Maverick as described above, then do the following:
- load the
c203-112$ module load vis idl
- launch IDL
or launch the IDL virtual machine:
c203-112$ idl -vm
If you are running IDL in scripted form, without interaction, simply submit a SLURM job that loads IDL and runs your script.
If you need to run IDL interactively from an xterm from your local machine outside of a VNC session, you will need to run an interactive SLURM job in the vis queue to allocate a Maverick compute node. To do this, use the SLURM command
srun to allocate an interactive shell. This command uses the same arguments as
srun -A Acct -n num_cores_requested -p queue -t time --pty /bin/bash -l
login1$ srun -A My-acct -n 20 -p vis -t 2:0:0 --pty /bin/bash -l
will charge SUs to My-acct and request one node (20 cores) in the
vis queue for two hours and run a bash login shell.
Note that any graphics windows opened from this command prompt may be significantly slower than when run through a VNC session.
NVIDIA's CUDA compiler and libraries are accessed by loading the CUDA module:
login1$ module load cuda
nvcc compiler on the login node to compile code, and run executables on nodes with GPUs-there are no GPUs on the login nodes. Maverick's K40 GPUs are compute capability 3.5 devices. When compiling your code, make sure to specify this level of capability with:
nvcc -arch=compute_35 -code=sm_35 ...
GPU nodes are accessible through the
gpu queue for production work and the
devel-gpu queue for development work. Production job scripts should include the "
module load cuda" command before executing cuda code; likewise, load the cuda module before or after acquiring an interactive, development gpu node with the "
The NVIDA CUDA debugger is
cuda-gdb. Applications must be debugged through a VNC session or an interactive
srun session. Please see the relevant
srun and VNC sections for more details.
The NVIDIA Compute Visual Profiler,
computeprof, can be used to profile both CUDA and OpenCL programs that have been developed in NVIDIA CUDA/OpenCL programming environment. Since the profiler is X based, it must be run either within a VNC session or by ssh-ing into an allocated compute node with X-forwarding enabled. The profiler command and library paths are included in the
$LD_LIBRARY_PATH variables by the CUDA module. The
computeprof executable and libraries can be found in the following respective directories:
For further information on the CUDA compiler, programming, the API, and debugger, please see:
The OpenCL heterogeneous computing language is supported on all Maverick computing platforms. The Intel OpenCL environment will support the Xeon processors and Xeon Phi coprocessors, and the NVIDIA OpenCL environment supports the Tesla accelerators.
The Intel OpenCL stack is not yet installed on Maverick. A user news announcement will be sent once it is installed.
The NVIDIA OpenCL environment supports the v1.1 API is accessible through the cuda module:
login1$ module load cuda
For programming with NVIDIA OpenCL, please see the OpenCL specification at: https://www.khronos.org/registry/OpenCL/specs/opencl-1.1.pdf.
Use the g++ compiler to compile NVIDIA-based OpenCL. The include files are located in the
$TACC_CUDA_DIR/include subdirectory. The OpenCL library is installed in the
/usr/lib64 directory, which is on the default library path. Use this path and g++ options to compile OpenCL code:
login1$ export OCL=$TACC_CUDA_DIR login1$ g++ -I $OCL/include -lOpenCL prog.cpp
Last update: February 22, 2017