Ranch User Guide
Last update: November 13, 2020

The deadline to migrate your data to the new archive system is December 1, 2020.

See Ranch Transition to Quantum Archiving System for detailed information on migrating your data.

Migration Notices

  • Any data newer than March 25, 2019 is already in the new Ranch system and does not require migration.

  • TACC staff cannot guarantee access to the older data once the old Ranch system has been decommissioned. Availability of the old Ranch system will be provided on a best-effort basis.

  • If you are a PI with older data from students who have moved on, contact us for assistance by submitting a ticket via the TACC User Portal or XSEDE User Portal.

  • In the new system, users will find a symbolic link (./old_HSM) that will lead them to their data on the Oracle HSM system, which is mounted as a read only filesystem. The new system will have more limits on inode usage (file count to you and me), so we would like users to think about the data they need, and to tar that up, and direct it to their new home directory. Please consult the "Organizing Your Data" section for more information.

  • When data is accessed on the old Oracle HSM system (from the /old_HSM link), your terminal may appear to hang, this is due to the Oracle HSM system staging data from tape(s), that could take a few minutes for some files, and significantly longer for very large files. As a metric to think about, if your files are large (5GB+), they will stream quickly off tape, at around 250MB/sec. If the files are small, then the tape drive will need to stop, start, and reposition, which will significantly slow down transfers.

Introduction

TACC's High Performance Computing (HPC) systems are used primarily for scientific computing and although their disk systems are large, they are not large enough to keep up with the long-term data generated on these systems. The Ranch system fills this need for high capacity long-term storage, by providing a massive, high-performance file system for archival purposes.

Ranch (ranch.tacc.utexas.edu), is a Quantum StorNext-based system, with a DDN provided front-end disk system (30PB raw), and a 5000 slot Quantum Scalar i6000 tape library.

Ranch is an allocated resource, meaning that Ranch is available only to users with an allocation on one of TACC's computational resources resources such as Frontera, Stampede2, or Maverick2. XSEDE PIs will be prompted automatically for the companion storage allocation amount as part of the XRAS submission request, UT and UT system PIs should also make a request and justify the amount of storage requested when applying for a Ranch allocation. The default allocation on Ranch for XSEDE, UT, and UT affiliate users, is 2TB. To request additional Ranch storage for your allocation, please submit a TACC user portal ticket.

Intended Use

Ranch is fundamentally based upon long-term tape storage and as such is designed for archiving data that is in a state wherein it will likely not change, and will likely not need to be accessed very often. Obviously, Ranch is to be used only for work-related data. Ranch is not meant for active data, nor is it intended to be a replication solution for your "/scratch" directory. Ranch is also not suitable for system backups, due to the large number of small files this inevitably generates and the nature of a tape-based archive. Also, at this time Ranch provides only a single instance of any data within it. Erroneously edit that data, or delete that data, and it is unrecoverable from within Ranch.

Ranch is an archival system. Ranch user data is not backed up or replicated. This means that Ranch contains only a single, active, instance of user data. While lost data due to tape damage or other system failure is rare, please keep this possibility in mind when formulating your data management plans. If you have irreplaceable data and would like a different level of service, please let us know via the ticketing system, and we can help you with a solution.

System Configuration

Ranch's primary storage system is a DDN SFA14K DCR (Declustered RAID) based system which is managed by Quantum's Stornext filesystem. The raw capacity is around 30PB, with about 17PB usable space for user data. Metadata is stored on a Quantum SSD based appliance. The backend tape library, which is where files migrate after they have been untouched on disk for a period of time (this will be tuned, but it is currently a few weeks), is a Quantum Scalar i6000, with LTO-8 tapes, each with an uncompressed capacity of 12.5 TB. Compressed capacity of an LTO-8 tape is around 30PB, but that assumes highly compressible data.

Formerly, the Ranch system was based on Oracle's HSM system, with two SL8500 libraries, each with 20,000 tape slots. This system will remain as a backend system while we transition data from the old libraries to the new one.

System Access

Direct login via Secure Shell's ssh command to Ranch is allowed so you can create directories and manage files. The Ranch archive system cannot be mounted on a remote system.

stampede2$ ssh taccusername@ranch.tacc.utexas.edu

Ranch Environment Variables

The preferred way of accessing Ranch, especially from scripts, is by using the TACC-defined environment variables $ARCHIVER and $ARCHIVE. These variables, defined on all TACC resources, define the hostname of the current TACC archival system, $ARCHIVER, and each account's personal archival space, $ARCHIVE. These environment variables help ensure that scripts will continue to work, even if the underlying system configuration changes in the future.

If you are trying to access data that is on the old part of Ranch, and you haven't yet transitioned that data to the new Quantum Stornext based portion of Ranch, you can add the old_HSM directory into the paths defined in your scripts, and still be able to read from Ranch that way. Since the filesystem is mounted as read only, you won't be able to send data into the old_HSM directory structure.

Accessing Files from Within Running Programs

Ranch access is not allowed from within running jobs on other TACC resources. Data must be first transferred from Ranch to your compute resource in order to be available to running jobs.

Citizenship on Ranch

  • Limit rsync and scp processes to no more than two processes.
  • Follow the procedures for archiving data
  • Store only data that was processed, or generated, on TACC's systems
  • Delete all unneeded data under your account
  • No workstation or other system backups

Organizing Your Data

From experience of past performance (predominantly the total retrieval time for a given set until completion), we recommend average file size of 300GB - 1TB. Smaller files slow down the retrieval rates drastically when multiple files were recalled from tapes. e.g. retrieval time of 100TB data collection in 100GB average size will be order of magnitude faster than those in average 1GB or less size. The new environment is designed to meet the demand of ~100TB data sets to be available in a few days or less instead of weeks, which is possible only when the average size is big enough.

Monitor your Ranch Disk Usage and File Counts

Users can check their current and historical Ranch usage by looking at the contents of the "HSM_usage" file in their Ranch home directory. Note that this file reflects DISK usage versus disk quota, for both total file size as well as total file count.

ranch$ tail ~/HSM_usage

This file is updated nightly as a convenience to the user. The data fields within this file show the files and storage in use both on-line and in the Ranch tape archives, as well as the quotas for each currently in effect. Each entry also shows the date and time of its update. Do not delete or edit this file.

Transferring Data

To maximize the efficiency of data archiving and retrieval for users, data should be transferred using large files. Small files don't do well on tapes, so they should be combined with others in a "tar" file wherever possible. The term "tar" is derived from (t)ape (ar)chive. Files that are very large (5 TB+), can also be a problem, since their contents can be split across multiple tapes, thereby increasing the chances that there will be problems retrieving the data. Use the UNIX split utility on very large files (1 TB+), and tar up small files into chunks between 10 GB and 300 GB in size. This will allow the archiver to work optimally.

Retrieving Files from Ranch

Since Ranch is an archive system, any files which have not been accessed recently will be stored on tape. To access files stored offline, they must be ‘staged' from tape, which is done automatically with tools like rsync and scp. We ask that you use the Unix tar command or another utility to bundle large numbers of small files together, before transferring to Ranch, for more efficient storage and retrieval on Ranch.

Ranch performs best on large files (10GB to 250GB). If you need a single file from a large tarball, it can easily be extracted without extracting the whole tarball. Due to the nature of the tapes that Ranch uses, it is quicker to read a single large file than it is to read multiple small files.

Large numbers of small files are hard for our tape drives to read back from tape, since the drives need to start and stop for every file. So instead of reading steadily at 252MB/sec, a drive reading many tiny files at a crawl may take a week to stage them back to disk, which occupies the drive, and prevents other users from accessing their data.

Limit your scp processes to no more than four at a time.

Data Transfer Methods

TACC supports two transfer mechanisms: scp (recommended) and rsync (avoid if possible).

scp

The simplest way to transfer files to and from Ranch is to use the Secure Shell "scp" command:

stampede2$ scp myfile ${ARCHIVER}:${ARCHIVE}/myfilepath

where "myfile" is the name of the file to copy and "myfilepath" is the path to the archive on Ranch. For large numbers of files, we strongly recommend you employ the Unix "tar" command to create an archive of one or more directories before transferring the data to Ranch, or as part of the transfer process.

To use ssh to create a "tar" archive file from a directory, you can use the following alternatives to copy files to Ranch

stampede2$ tar cvf - dirname | ssh ${ARCHIVER} "cat > ${ARCHIVE}/mytarfile.tar"

where "dirname" is the path to the directory you want to archive, and "mytarfile.tar" is the name of the archive to be created on Ranch.

Note that when transferring to Ranch, the destination directory/ies must already exist. If not, scp will respond with:

No such file or directory

The following command-line examples demonstrate how to transfer files to and from Ranch using scp.

  • copy "mynewfile" from Stampede2 to Ranch:

    stampede2$ scp mynewfile ${ARCHIVER}:${ARCHIVE}/mynewfilename
  • copy "myoldfile" from Ranch to my computer

    stampede2$ scp ${ARCHIVER}:${ARCHIVE}/myoldfile .

rsync

The UNIX rsync command is another way to keep archives up-to-date. Rather than transferring entire files, rsync transfers only the actual changed parts of a file. This method has the advantage over the scp command in that it can recover if there is an error in the transfer. Enter "rsync -h" for detailed help on this command.

A huge downside to rsync however, is that it will stage data before it can start the sync, so this can lead to a lot of unnecessary staging calls, and really waste resources. In general, it is a bad idea to rsync a whole directory, and it is horrible for archiving data with a tape based archive system like ours.

On the new Quantum StorNext filesystem, data will stay on the front end disk for significantly longer than it did with the previous system, due to a much larger front end disk system, which means that data that has recently been sent to Ranch can safely be rsync'ed. If the data has been on the system for a significant time (around a month, but we will tune that variable over time), it may have migrated to tape, and will still cause the same problems as it did on the old Oracle HSM system.

Large Data Transfers

If you are moving a very large amount of data to Ranch and you encounter a quota limit error, then you are bumping into the limit of data you can have on Ranch's cache system. There are limits on cache inode usage, and disk block usage, but these limits should only affect a few very heavy users and do not affect a user's total allocation on the Ranch archival system. If you encounter a quota error, please submit a ticket to the TACC user portal, and we will work with you to make sure your data is transferred as efficiently as possible. The limits are merely to prevent the system getting unexpectedly overloaded, and thus maintaining good service for all users.

Use the "du -h" command to see how much data you have on the disk.

Archive a large directory with tar, and move it to Ranch, while splitting it into smaller parts. e.g.:

stampede2$ tar -cvf - /directory/ | ssh ranch.tacc.utexas.edu 'cd /your_ranch_path/ && split -b 1024m - files.tar.'

Alternatively, you can split large output files, or tar files, on the Stampede2 side, then move them to Ranch.

Large files, more than a few TB in size, should be split into chunks, preferably between 10GB and 500GB in size. Use the split command on Stampede2 to accomplish this:

stampede2$ split -b 300G myverybigfile.tar my_file_part_

The above example will create several 300GB files, with the filenames: my_file_part_aa, my_file_part_ab, my_file_part_ac, etc.

The split parts of a file can be joined together again with the "cat" command.

stampede2$ cat my_file_part_?? > myverybigfile.tar

See "man split" for more options.

Large collections of small files must be bundled into a tar archive, called "tarballs" before being sent to Ranch, or even better, create the tar file while on route to Ranch (that way there is no temporary tar file on the source filesystem).

The following example will create an archive of the my_small_files_directory in the current working directory:

stampede2$ tar -cvf my_data.tar my_small_files_directory/

Help Desk

TACC Consulting operates from 8am to 5pm CST, Monday through Friday, except for holidays. You can submit a help desk ticket at any time via the TACC User Portal with "Ranch" in the Resource field. Help the consulting staff help you by following these best practices when submitting tickets.

  • Do your homework before submitting a help desk ticket. What does the user guide and other documentation say? Search the internet for key phrases in your error logs; that's probably what the consultants answering your ticket are going to do. What have you changed since the last time your job succeeded?

  • Describe your issue as precisely and completely as you can: what you did, what happened, verbatim error messages, other meaningful output. When appropriate, include the information a consultant would need to find your artifacts and understand your workflow: e.g. the directory containing your build and/or job script; the modules you were using; relevant job numbers; and recent changes in your workflow that could affect or explain the behavior you're observing.

  • Subscribe to Ranch User News. This is the best way to keep abreast of maintenance schedules, system outages, and other general interest items.

  • Have realistic expectations. Consultants can address system issues and answer questions about Ranch. But they can't teach parallel programming in a ticket, and may know nothing about the package you downloaded. They may offer general advice that will help you build, debug, optimize, or modify your code, but you shouldn't expect them to do these things for you.

  • Be patient. It may take a business day for a consultant to get back to you, especially if your issue is complex. It might take an exchange or two before you and the consultant are on the same page. If the admins disable your account, it's not punitive. When the file system is in danger of crashing, or a login node hangs, they don't have time to notify you before taking action.

Ranch Transition to Quantum Archiving System
Last update: November 13, 2020

The deadline to migrate your data to the new archive system is December 1, 2020.

Migration Notices

  • Any data newer than March 25, 2019 is already in the new Ranch system and does not require migration.

  • TACC staff cannot guarantee access to the older data once the old Ranch system has been decommissioned. Availability of the old Ranch system will be provided on a best-effort basis.

  • If you are a PI with older data from students who have moved on, contact us for assistance by submitting a ticket via the TACC User Portal or XSEDE User Portal.

  • In the new system, users will find a symbolic link (./old_HSM) that will lead them to their data on the Oracle HSM system, which is mounted as a read only filesystem. The new system will have more limits on inode usage (file count to you and me), so we would like users to think about the data they need, and to tar that up, and direct it to their new home directory. Please consult the "Organizing Your Data" section for more information.

  • When data is accessed on the old Oracle HSM system (from the /old_HSM link), your terminal may appear to hang, this is due to the Oracle HSM system staging data from tape(s), that could take a few minutes for some files, and significantly longer for very large files. As a metric to think about, if your files are large (5GB+), they will stream quickly off tape, at around 250MB/sec. If the files are small, then the tape drive will need to stop, start, and reposition, which will significantly slow down transfers.

On March 25, 2019 Ranch transitioned its Hierarchical Storage Management (HSM) software from Oracle to Quantum StorNext along with massively expanding the front-end disk system and tape library. This transition guide will introduce you to Ranch's transitional archive structure, new Ranch quotas, new Project Spaces, and steps to organizing your data on the new Quantum architecture.

New: Ranch Archive Structure

The transition to the new storage architecture has effectively created two archives on Ranch:

  1. The previous Oracle archive consisting of all user data uploaded prior to March 25, 2019, and
  2. The new Quantum system archive that will consist of all data uploaded subsequent to March 25, 2019

During this maintenance, new personal directories were created for each user on the new Quantum archive. When users log on to Ranch following the maintenance, they will automatically be placed into their new personal directory. These new home directories are empty except for a single link to the user's data on the Oracle archive, "old_HSM", where each user's data is stored, and the standard HSM_usage file which is updated nightly and provides a snapshot of your Ranch usage. The Oracle system is now a read-only system: you can copy data from the system, but not to the system.

ranch$ ls -1F                            # display contents of your new home directory on the Quantum archive
old_HSM/
newdataset
ranch$ ls old_HSM/                       # display contents of the old Oracle archive
olddataset.tar
HSM_usage
ranch$ cp -p old_HSM/olddataset.tar .    # after evaluation, save "olddataset" to the new archive
ranch$ cp newdataset old_HSM             # NOPE! Will fail, the Oracle archive is read-only

TACC staff will not be moving any user data. Instead, this is an ideal opportunity for all Ranch users to review and evaluate for retention their own archived data. Simply use the standard "cp" command to transfer data across archives as demonstrated above. Consult the Organizing Your Data section below for further information on bundling and moving your data. Note that Ranch is intended to archive only work-related data.

We encourage users to carefully copy ONLY the data that they need to keep beyond March 31, 2020 by copying it from the Oracle archive to the new Quantum archive. The Oracle archive will remain in service and accessible until the end of Mar 31, 2020. After that date, the old_HSM link will be removed, however, your data will still be available upon request for a limited time.

New: Ranch Updated Quotas

  • File Count Quota: Users are limited to 50,000 files in their $HOME directories.
  • File Space Quota: Users are limited to 2 Terabytes (TB) of disk space in their $HOME directories.

You can display your current Ranch file space usage by executing either of the following UNIX commands:

ranch$ ls -lh
or
ranch$ du -sh

Keep in mind the above commands only display file space used, not a total file count. It is the user's responsibility to keep the file count below the 50,000 quota by using the UNIX "tar" command or some other methodology to bundle files. Both the file space and file count quotas apply to all data copied from the Oracle archive and all new incoming data.

New: Ranch Project Spaces

With this upgrade, Ranch introduces Project Spaces, a special directory structure designed to support both shared and oversized data directories for users or projects whose storage needs exceed the standard 2TB quota. Submit a support ticket to request a customized project space on Ranch. TACC staff have already created several hundred Project Spaces. Consult with your PI regarding your exact Project Space name.

Organizing Your Data

After over a decade of operation and servicing more than 49,000 user accounts, what has been revealed after running Ranch for so long, with so many users, is that limiting total file count, as well as enforcing explicit data retention periods, will be the keys to continued sustainable Ranch operation over the long-term.

When organizing your data keep in mind that reducing file count is at least as important as reducing file space. Ranch performs best with large files and performance will suffer severely if Ranch is kept busy archiving lots of small files compared to large files. For this reason, users must bundle up their small-file-filled directories into single large files. The best way to bundle files is using the UNIX "tar" command into single large files called "tarballs". We include several examples below, also consult the tar man page for detailed information on this command.

Users with Less Than 2TB of Ranch Data

Nearly all Ranch users fall into this category using less than 2TB of space and less than 50K files stored in their directories. The usage report file, "old_HSM/HSM_usage" shows your historical usage along with the number of files and total blocks used. Display the last line of this file to discover your current file space and count usage:

ranch$ tail -1 old_HSM/HSM_usage
UID: 822434 Files: 475 of 11000 TB_online: 0 of 2 TB_total: 0 of 2 (Wed Mar 13 00:30:59 2019)

If you're like most users with less than 50,000 files, and less than 2TB disk space used, then you can copy your data to the new library with a single command:

ranch$ cp -p -r old_HSM/* .

If you have more than 50,000 files, then you must review and regroup your data. We expect you, as the owner of the data sets, to know your data better than we do. You can bundle and regroup your existing data into tar files to reduce the file counts. The destination of your generated tar files should be in the new environment.

For example if you have directories already containing large files (300GB-1TB in size), then you can simply copy the directory over unchanged:

ranch$ cp -p -r old_HSM/dir1_with_big_files .

For directories containing lots of smaller files, you must bundle up the directory and its contents into one large single file:

ranch$ cd old_HSM/old_HSM/dir2_with_small_files
ranch$ tar cf ~/dir2_with_small_files.tar .

Tip: For easy lookup later, generate an index of the tarball contents into a separate much smaller file. You can search for files much quicker via the index rather than recalling the entire tar file(s) from the tapes.

ranch$ tar tf ./dir2_with_small_files.tar > dir2_with_small_files.idx

Users with More Than 2TB of Ranch Data

If you have more than 2TB of existing Ranch data, and your data belongs to an approved allocation, then, after reviewing your data on the old archive you may request a Project Space. Submit a support ticket with the following information:

  • project name
  • total TB data size in TB
  • PI of the group and list of users belonging to the group
  • desired retention period

TACC staff will then set up a Project Space directory structure with the desired name (or the approved project ID by default) and set the appropriate quota. We will do our best to accommodate the total size needs but will request the collection to be in the desirable format for tape archives.

Please note that several hundred Project Spaces have already been created. Consult with your PI regarding your exact Project Space name prior to beginning copying your data.

From experience of past performance (predominantly the total retrieval time for a given set until completion), we recommend average file size of 300GB - 1TB. Smaller files slow down the retrieval rates drastically when multiple files were recalled from tapes. e.g. retrieval time of 100TB data collection in 100GB average size will be order of magnitude faster than those in average 1GB or less size. The new environment is designed to meet the demand of ~100TB data sets to be available in a few days or less instead of weeks, which is possible only when the average file size is big enough.