Ranch Transition to Quantum Archiving System
Last update: March 27, 2019
 

On March 25, 2019 Ranch transitioned its Hierarchical Storage Management (HSM) software from Oracle to Quantum StorNext along with massively expanding the front-end disk system and tape library. This transition guide will introduce you to Ranch's transitional archive structure, new Ranch quotas, new Project Spaces, and steps to organizing your data on the new Quantum architecture.

New: Ranch Archive Structure

The transition to the new storage architecture has effectively created two archives on Ranch:

  1. The previous Oracle archive consisting of all user data uploaded prior to March 25, 2019, and
  2. The new Quantum system archive that will consist of all data uploaded subsequent to March 25, 2019

During this maintenance, new personal directories were created for each user on the new Quantum archive. When users log on to Ranch following the maintenance, they will automatically be placed into their new personal directory. These new home directories are empty except for a single link to the user's data on the Oracle archive, "old_HSM", where each user's data is stored, and the standard HSM_usage file which is updated nightly and provides a snapshot of your Ranch usage. The Oracle system is now a read-only system: you can copy data from the system, but not to the system.

ranch$ ls -1F                            # display contents of your new home directory on the Quantum archive
old_HSM/
newdataset
ranch$ ls old_HSM/                       # display contents of the old Oracle archive
olddataset.tar
HSM_usage
ranch$ cp -p old_HSM/olddataset.tar .    # after evaluation, save "olddataset" to the new archive
ranch$ cp newdataset old_HSM             # NOPE! Will fail, the Oracle archive is read-only

TACC staff will not be moving any user data. Instead, this is an ideal opportunity for all Ranch users to review and evaluate for retention their own archived data. Simply use the standard "cp" command to transfer data across archives as demonstrated above. Consult the Organizing Your Data section below for further information on bundling and moving your data. Note that Ranch is intended to archive only work-related data.

We encourage users to carefully copy ONLY the data that they need to keep beyond March 31, 2020 by copying it from the Oracle archive to the new Quantum archive. The Oracle archive will remain in service and accessible until the end of Mar 31, 2020. After that date, the old_HSM link will be removed, however, your data will still be available upon request for a limited time.

New: Ranch Updated Quotas

  • File Count Quota: Users are limited to 50,000 files in their $HOME directories.
  • File Space Quota: Users are limited to 2 Terabytes (TB) of disk space in their $HOME directories.

You can display your current Ranch file space usage by executing either of the following UNIX commands:

ranch$ ls -lh
or
ranch$ du -sh

Keep in mind the above commands only display file space used, not a total file count. It is the user's responsibility to keep the file count below the 50,000 quota by using the UNIX "tar" command or some other methodology to bundle files. Both the file space and file count quotas apply to all data copied from the Oracle archive and all new incoming data.

New: Ranch Project Spaces

With this upgrade, Ranch introduces Project Spaces, a special directory structure designed to support both shared and oversized data directories for users or projects whose storage needs exceed the standard 2TB quota. Submit a support ticket to request a customized project space on Ranch. TACC staff have already created several hundred Project Spaces. Consult with your PI regarding your exact Project Space name.

Organizing Your Data

After over a decade of operation and servicing more than 49,000 user accounts, what has been revealed after running Ranch for so long, with so many users, is that limiting total file count, as well as enforcing explicit data retention periods, will be the keys to continued sustainable Ranch operation over the long-term.

When organizing your data keep in mind that reducing file count is at least as important as reducing file space. Ranch performs best with large files and performance will suffer severely if Ranch is kept busy archiving lots of small files compared to large files. For this reason, users must bundle up their small-file-filled directories into single large files. The best way to bundle files is using the UNIX "tar" command into single large files called "tarballs". We include several examples below, also consult the tar man page for detailed information on this command.

Users with Less Than 2TB of Ranch Data

Nearly all Ranch users fall into this category using less than 2TB of space and less than 50K files stored in their directories. The usage report file, "old_HSM/HSM_usage" shows your historical usage along with the number of files and total blocks used. Display the last line of this file to discover your current file space and count usage:

ranch$ tail -1 old_HSM/HSM_usage
UID: 822434 Files: 475 of 11000 TB_online: 0 of 2 TB_total: 0 of 2 (Wed Mar 13 00:30:59 2019)

If you're like most users with less than 50,000 files, and less than 2TB disk space used, then you can copy your data to the new library with a single command:

ranch$ cp -p -r old_HSM/* .

If you have more than 50,000 files, then you must review and regroup your data. We expect you, as the owner of the data sets, to know your data better than we do. You can bundle and regroup your existing data into tar files to reduce the file counts. The destination of your generated tar files should be in the new environment.

For example if you have directories already containing large files (300GB-1TB in size), then you can simply copy the directory over unchanged:

ranch$ cp -p -r old_HSM/dir1_with_big_files .

For directories containing lots of smaller files, you must bundle up the directory and its contents into one large single file:

ranch$ cd old_HSM/old_HSM/dir2_with_small_files
ranch$ tar cf ~/dir2_with_small_files.tar .

Tip: For easy lookup later, generate an index of the tarball contents into a separate much smaller file. You can search for files much quicker via the index rather than recalling the entire tar file(s) from the tapes.

ranch$ tar tf ./dir2_with_small_files.tar > dir2_with_small_files.idx

Users with More Than 2TB of Ranch Data

If you have more than 2TB of existing Ranch data, and your data belongs to an approved allocation, then, after reviewing your data on the old archive you may request a Project Space. Submit a support ticket with the following information:

  • project name
  • total TB data size in TB
  • PI of the group and list of users belonging to the group
  • desired retention period

TACC staff will then set up a Project Space directory structure with the desired name (or the approved project ID by default) and set the appropriate quota. We will do our best to accommodate the total size needs but will request the collection to be in the desirable format for tape archives.

Please note that several hundred Project Spaces have already been created. Consult with your PI regarding your exact Project Space name prior to beginning copying your data.

From experience of past performance (predominantly the total retrieval time for a given set until completion), we recommend average file size of 300GB - 1TB. Smaller files slow down the retrieval rates drastically when multiple files were recalled from tapes. e.g. retrieval time of 100TB data collection in 100GB average size will be order of magnitude faster than those in average 1GB or less size. The new environment is designed to meet the demand of ~100TB data sets to be available in a few days or less instead of weeks, which is possible only when the average file size is big enough.