Ranch User Guide
Last update: July 8, 2017

Notices

03/24/17 The "stage" command is temporarily disabled to prevent excessive stage calls saturating the stage queue. Most users will be able to transfer files without pre-staging, since commands like rsync and scp will stage data automatically. If you have a large number of small files to transfer, please submit a support ticket and we will assist you.

11/04/16 Please examine the new section "Organizing Your Data" below. This section contains new information and examples on how to best organize and package your data for optimal archiving and retrieval. A few simple steps can make the difference between hours and weeks for data retrieval.

Overview

TACC's High Performance Computing (HPC) systems are used primarily for scientific computing and although their disk systems are large, they are not big enough to keep up with the data generated on the systems. The Ranch system fills this need for high capacity storage, by providing a massive, high-performance file system for archival purposes.

Ranch (ranch.tacc.utexas.edu), a Sun Microsystems StorageTek Mass Storage Facility, is TACC's long-term storage solution. Ranch utilizes Sun's Storage Archive Manager Filesystem (SAM-FS) for migrating files to and from a tape archival system with a current storage capacity of 160 Petabytes (PB).

Intended Use

Ranch consists of long term tape storage and is designed to be used for archiving data. Ranch is not meant for active data, nor is it intended to be a backup solution for your "/scratch" directory. Ranch is also not suitable for system backups, due to the large number of small files this inevitably generates and the nature of a tape based archive. The Ranch system provides a single backup copy of project related data.

Please Note: Ranch is an archival system. The Ranch system is not backed up or replicated. This means that Ranch contains a single copy of user data. While lost data due to tape damage is rare, please keep this possibility in mind when making data management plans.

System Configuration

Ranch's metadata system is built on two Sun 7420 appliances, with 4 disk storage shelves, 4 high speed read cache SSDs, 4 log SSDs, and 88 x 15k 600GB SAS disks. The high performance SAM-QFS file system stores metadata and data separately, on dedicated devices. This maximizes data I/O, and means that updates to the metadata don't affect I/O to the data cache system. The data is stored in 8 Dell MD3600i based arrays, each with 7 MD1200 enclosures attached, so 96 disks in each array. These arrays are broken up into multiple file systems, including the home file systems, and a file system for tiny files that are too inefficiently small to go to tape. And then there is the tape storage.

Two Oracle StorageTek Sl8500 Automated Tape Libraries are combined to serve the offline archival storage. Each SL8500 library contains 10,000 tape slots and 64 tape drive slots. Each tape is capable of holding 8.5 TB of uncompressed data, so when fully populated, a single SL8500 library can house 85 PB on the current generation of tapes. Each SL8500 library also contains 8 robots to manage tapes and move them to or from the tape drives.

The /home filesystem has an expandable capacity of 160 petabytes.

Ranch Archive System at TACC
Figure 1. Ranch System Configuration

System Access

Ranch is an allocated resource, meaning the system is available only to users with an allocation on one of TACC's computational resources such as Stampede, Lonestar 5, or Maverick. XSEDE PIs will be prompted automatically for the companion storage allocation amount as part of the XRAS submission request, UT and UT system PIs should also make a request and justify the amount of storage requested when applying for an allocation. The default allocation for XSEDE, UT, and UT affiliate users on Ranch is 500 GB. To request additional Ranch storage for your allocation, please submit a TACC user portal ticket.

Ranch users may login directly via the standard SSH utilities as described below.

Methods of Access

Direct login via Secure Shell's "ssh" to Ranch is allowed so you can create directories and migrate files from tape back to the disk subsystem for later transfer to TACC machines or personal computers. The Ranch archive system cannot be mounted on a remote system.

stampede$ ssh -l taccusername ranch.tacc.utexas.edu

Inherited Environment Variables

The preferred way of accessing Ranch, especially from scripts, is by using the TACC-defined environment variables $ARCHIVER and $ARCHIVE. These variables, defined on all TACC resources, define the hostname of the current TACC archival system, $ARCHIVER, and each account's personal archival space, $ARCHIVE. These environment variables help ensure that scripts will continue to work, even if the underlying system configuration changes in the future.

Accessing Files from within Running programs

Ranch access is not allowed from within running jobs on other TACC resources. Data must be first transferred from Ranch to your compute resource in order to be available to running jobs.

Citizenship on Ranch

  • Limit rsync and scp processes to no more than two processes.
  • Follow the procedures for archiving data as outlined below - tar & stage
  • Please delete all unneeded data under your account
  • No workstation or other system backups

Organizing Your Data

We ask our users to employ the following practices going forward. Your existing data does not need to be touched, and you only need to make changes going forward. Our goal is to increase performance and usability for all users of our system.

  1. Please use the UNIX "split" utility to chop up very large files (1 TB+)
  2. Tar up small files into chunks between 10 GB and 300 GB in size.
  3. Minimize the number of files you store on Ranch. For example, 1000 files in a directory tree should be turned into a single, or small number of tar files, depending on how the data is organized.

The archiver process moves your data from spinning disk cache to the tape media (see Figure 1.) In order for the archiver to work optimally your data should be bundled into large tar files.

Online data limit of 10TB. This limit is independent of the total quota or allocation you have been granted. The online limit refers to the amount of data that resides on the spinning disk cache before it is written to tape. This limit is to cap the influx of large amounts of data, which can negatively affect all users, it will also prevent users from staging very large amounts of data all at once, so a batch strategy must be used for staging data if you need to copy more than 10TB off the system.

TACC staff are available to advise you in developing a data storage strategy. Take a look at the following examples of data organization to clarify your understanding of how the system should be used:

Example 1: Data is comprised of a large number of small files:

A user on Stampede with a large dataset of 15,000 small files, at around 900GB in total size. Ideally, the user would tar the files into three tar files of around 300GB, containing 5,000 files each, then transfer the data to Ranch. Data management (creating tarballs of directory trees) should be done before moving the data to Ranch.

Example 2: Data to be transferred exceeds 10TB online data limit

Another user with 30TB of data, in 100 Tar files, each around 300GB, is much more appropriate for storing data to tape. However, if the user tries to move all their data to Ranch in one go, then they will run into the 10TB online data limit, and their transfer will stop. This limit is intended to stop huge influxes of data, as hundreds of TB can cause issues. What the user should do, is transfer 9 of their 300GB files at a time, wait for the files to be archived to tape (instructions in user guide for checking) and for the files to be released from the disk cache system (happens automatically after files are archived), then transfer another batch of 9 300GB files.

Managing files with SAM-FS

Ranch uses the Storage and Archive Manager File System (SAM-FS). SAM-FS supplies several extensions to common UNIX commands, as detailed in Table 1. These commands help manage the storage and location of files stored on Ranch.

Table 1. Common SAM-FS Commands

SAM-FS Commands Description
sdu summarizes disk usage. The "sdu" command is based on the GNU version of the Unix "du" command
sfind searches for files in a directory hierarchy. The "sfind" command is based on the Unix "find" command and contains options for searching based on SAM-FS file attributes.
sls lists contents of directories. The "sls" command is based on the Unix "ls" command and contains options for displaying SAM-FS file system attributes and information
release releases files from the disk cache system that are already on tape. This happens automatically, so only users trying to move multiple terabytes to Ranch will need this command

Please consult each command's man page for additional, detailed information:

ranch$ man command

Transferring Files

To maximize the efficiency of data archiving and retrieval for users, data should be transferred using large files. Small files don't do well on tapes, so they should be combined with others in a "tar" file wherever possible. The name tar is derived from (t)ape (ar)chive. Files that are very large (5 TB+), can also be a problem, since their contents can be split across multiple tapes, thereby increasing the chances that there will be problems retrieving the data. Use the UNIX split utility on very large files (1 TB+), and tar up small files into chunks between 10 GB and 300 GB in size. This will allow the archiver to work optimally. Please submit a consulting ticket to the TACC portal if you have questions, or would like help developing a strategy for storing your data.

Retrieving Files from Ranch

Since Ranch is an archive system, any files which have not been accessed recently will be stored on tape. To access files stored offline, they must be 'staged' from tape, which is done automatically with tools like rsync and scp. We ask that you use the Unix tar command or another utility to bundle large numbers of small files together, before transferring to Ranch, for more efficient storage and retrieval on Ranch.

Ranch performs best on large files (10GB to 250GB) and datasets. If you need a single file from a large tarball, it can easily be extracted without extracting the whole tarball. Due to the nature of the tapes that Ranch uses, it is quicker to read a single large file than it is to read multiple small files.

Large numbers of small files are hard for our tape drives to read back from tape, since the drives need to start and stop for every file. So instead of reading steadily at 252MB/sec (max read speed for T10KD drive), a drive reading many tiny files at a crawl may take a week to stage them back to disk, which occupies the drive, and prevents other users from accessing their data.

Popular transfer methods like scp and rsync should be less than four processes at a time. During the upload TO ranch it is true that one could achieve better throughput with more threads, but when taking data FROM ranch it will adversely affect the throughput if the data set is not staged prior to rsync or scp.

Data transfer methods

TACC supports three transfer mechanisms: scp (simple), rsync (avoid if possible) and globus-url-copy (best for moving large files).

scp

The simplest way to transfer files to and from Ranch is to use the Secure Shell scp command:

stampede$ scp myfile ${ARCHIVER}:${ARCHIVE}/myfilepath

where myfile is the name of the file to copy and myfilepath is the path to the archive on Ranch. For large numbers of files, we strongly recommend you employ the Unix "tar" command to create an archive of one or more directories before transferring the data to Ranch, or as part of the transfer process.

To use ssh to create a "tar" archive file from a directory, you can use the following alternatives to copy files to Ranch

stampede$ tar cvf - dirname | ssh ${ARCHIVER} "cat > ${ARCHIVE}/mytarfile.tar"

where "dirname" is the path to the directory you want to archive, and "mytarfile.tar" is the name of the archive to be created on Ranch.

Note that when transferring to Ranch, the destination directory/ies must already exist. If not, scp will respond with:

No such file or directory

The following command line examples demonstrate how to transfer files to and from Ranch using scp.

  • copy "mynewfile" from Stampede to Ranch:

    stampede$ scp mynewfile ${ARCHIVER}:${ARCHIVE}/mynewfilename
  • copy "myoldfile" from Ranch to my computer

    stampede$ scp ${ARCHIVER}:${ARCHIVE}/myoldfile .

rsync

The UNIX rsync command is another way to keep archives up-to-date. Rather than transferring entire files, rsync transfers only the actual changed parts of a file. This method has the advantage over SCP in that it can recover if there is an error in the transfer. Enter "rsync -h" for detailed help on this command.

A huge downside to rsync however, is that it will stage data before it can start the sync, so this can lead to a lot of unnecessary staging calls, and really waste resources. In general, it is a bad idea to rsync a whole directory, and it is horrible for archiving data.

globus-url-copy

If transferring data between XSEDE sites, use globus-url-copy. This command requires the use of an XSEDE certificate to create a proxy for passwordless transfers. It has a complex syntax, but provides high-speed access to other XSEDE machines that support gridFTP services, the protocol for globus-url-copy. High-speed file or directory transfers occur between the different FTP servers at the XSEDE sites. The GridFTP servers mount the file systems of the compute machines, thereby providing access to your files or directories. Third party transfers, transfers initiated between two machines from another machine, are supported too. Please consult the globus-url-copy docs and the XSEDE Data Transfers & Management page for more information.

To start using globus-url-copy:

  • First, load the CTSSV4 module. This command will setup the necessary environment variable to use globus-url-copy software.

    ranch$ module load CTSSV4
  • Now use the myproxy-logon command to obtain a proxy certificate. This command will prompt for the certificate password. The proxy is valid for 12 hours for all logins on the local machine.

    ranch$ myproxy-logon
  • With globus-url-copy, you must include the name of the server and a full path to the file. The general syntax looks like:

    globus-url-copy options \
        gsiftp://gridftp_server1/directory/file \
        gsiftp://gridftp_server2/directory/file

Use the "-stripe" and "-buffer" size options (-stripe to use multiple service nodes, "-tcp-bs" 11M to set ftp data channel buffer size). Otherwise, the speed will be about 20 times slower! When transferring directories, the directory path must end with a slash (/).

Consult the Data Transfers & Management page for a table of GridFTP endpoints.

For globus-url-copy, especially via Globus interface, users should check the actual transfer of data before multiple copies are submitted. All too often, one user saturates the entire server threads due to multiple submission blocking other users. Many users achieve very high throughput with a single globus-url-copy command. Of note here is also the movement of data OUT of ranch for un-staged data. All threads will wait until each file is staged and will not move any data just taking up the server threads. Sending multiple globus-url-copy commands on un-staged data tends to hurt the system's performance due to serialized stage calls on tapes.

Large Data Transfers

If you are moving a very large amount of data to Ranch and you encounter a quota limit error, then you are bumping into the limit of data you can have on Ranch's cache system. There are limits on cache inode usage, and disk block usage, but these limits should only affect a few very heavy users and do not affect a user's total allocation on the Ranch archival system. If you encounter a quota error, please submit a ticket to the TACC user portal, and we will work with you to make sure your data is transferred as efficiently as possible. The limits are merely to prevent the system getting unexpectedly overloaded, and thus maintaining good service for all users.

Use the "du -h" command to see how much data you have on the cache system. The command "sdu -h" will tell you how much data you have total, including what is on tape.

Use the "release" command to release data from the cache system that has already been archived. It will only release data that has already been archived. Archiving takes time, especially if you are trying to move a very large amount of data, so please be patient.

Archive a large directory with tar, and move it to Ranch, while splitting it into smaller parts. Eg:

stampede$ tar -cvf - /directory/ | ssh ranch.tacc.utexas.edu 'cd /your_ranch_path/ && split -b 1024m - files.tar.'

Alternatively, you can split large output files, or tar files, on the Stampede side, then move them to Ranch, since Ranch has an older Solaris version of split, which is more difficult to work with.

Large files, more than a few TB in size, should be split into chunks, preferably between 10GB and 500GB in size. Use the split command on Stampede to accomplish this:

stampede$ split -b 300G myverybigfile.tar my_file_part_

The above example will create several 300GB files, with the filenames: my_file_part_aa, my_file_part_ab, my_file_part_ac, etc.

The split parts of a file can be joined together again with the "cat" command.

stampede$ cat my_file_part_?? > myverybigfile.tar

See man split for more options.

Large collections of small files must be put into a tar archive before being sent to Ranch, or even better, put into a tar file while on route to Ranch (that way there is no temporary tar file on the source filesystem).

The following example will create an archive of my_small_files_directory in the current working directory.

stampede$ tar -cvf my_data.tar my_small_files_directory/

As with staging, when using scp or rsync , you must limit your transfers to a maximum of four processes at a time. If using globus-url-copy, then limit your transfer to one process at a time.

Help

For 24/7 assistance using Ranch, please submit a support ticket via the help desk.