Stockyard (/work) File Cleanup and Migration
Last update: April 19, 2021

Introduction

TACC's Stockyard-hosted file system, currently synonymous with the "/work" file system, is approaching end-of-life. A new file system, /work2, is now available to facilitate data migration and eventually replace /work. At this time, all users with files on /work must determine whether to migrate their data to /work2, move it to a non-TACC facility or leave it on /work where it will eventually become unavailable. The TACC Ranch archive resource is also available for data you no longer need for processing but wish to preserve long-term.

Before /work2 permanently replaces the current /work users must migrate any current data they wish to keep to /work2 while continuing to use their existing workflows. Users should complete the migration before June 15, 2021 when /work2 becomes the work file system mounted on compute resources.

Prior to migrating data, review your data to identify that which can be deleted. The Ranch Archive Facility is also available for data that no longer needs to be in /work but must be retained long term.

Stockyard Layout

The /work2 file system is mounted on all of the major TACC clusters and is becoming available anywhere else that /work is available. We expect that any remaining systems that have /work but are missing /work2 will be mounted shortly. TACC staff may begin data migrations immediately where /work2 is mounted.

The $STOCKYARD environment variable points to the highest-level directory that you own on the Global Shared File System. This variable is consistent across all TACC resources that mount Stockyard. The $WORK environment variable, on the other hand, is resource-specific and varies across systems. $WORK is a subdirectory of $STOCKYARD and its name corresponds to the associated TACC resource.


Figure 1. Stockyard, TACC's Global Shared File System

All subdirectories contained in your $STOCKYARD directory are available to you from any other TACC system that mounts the file system. If you have accounts on both Stampede2 and Maverick2, for example, the $STOCKYARD/stampede2 directory is available from your Maverick2 account, and $STOCKYARD/maverick2 is available from your Stampede2 account. Your quota and reported usage on the Global Shared File System reflects all files that you own on Stockyard, regardless of their actual location on the file system.

Migration Timeline

The dates below are subject to fluctuation depending on the overall migration status.

Migration from /work to /work2 begins in March 2021. Both /work and /work2 will exist during the migration. but we intend to rename and mount the new filesystem as /work once complete and the original Stockyard filesystem goes offline. This means that it would be beneficial not to create too much automation targeting /work2 locations, use references to it sparingly or make them easy to change.

Time Period /work permissions /work2 permissions
Phase 1 present - May 4, 2021 read and write read and write
Phase 2 May 4 - June 1, 2021 read only read and write
Phase 3 June 1 - June 15, 2021 mounted only on HPC login nodes read and write
Phase 4 Migration period complete 15 June 2021 unmounted /work2 renamed to /work

Final Destination Determination

Prior to any file migration, it's imperative that you take stock of your existing data and migrate only what is needed for your current research over to the new file system. This is the perfect time to delete or put aside non-relevant files, compress the rest and transfer. All other data should either be migrated to long-term storage (Ranch), moved to a non-TACC facility or deleted. During this process, please take care if you're creating tar balls on /work that you don't exceed your quota.

For all your data, and for every file, determine whether to:

  1. Migrate to the new /work2 file system, or
  2. Transfer to long term storage (Ranch), or
  3. Delete

Stockyard is not backed up, neither /work nor /work2.
Take care when deleting, especially if doing so recursively.

Migrating your Files from /work to /work2

If you have allocations on multiple TACC resources, then your /work directory will consist of several resource-specific subdirectories. See Figure 1.

At this time, we are recommending you migrate one Stockyard resource-specific subdirectory at a time, e.g. rsync /work/lonestar5, then /work/stampede2, etc. You may transfer any non-resource-specific directories at any time.

Use "rsync" to Move Files

The current /work and the new /work2 are both Lustre file systems. There is no need to stripe the receiving /work2 directories as any new files are not expected to exceed 2TB. Your quotas across /work and /work2 will remain consistent.

Contrary to our usual file transfer recommendations, TACC staff advises you to use the "rsync" command to transfer your /work contents over to /work2. We are advising against using the "tar" utility in order to avoid temporary storage issues on /work as much as possible.

Use "rsync" to transfer files between the /work and /work2 file systems. The corresponding directories on /work2 will already exist.

Example rsync Command

In this example command, user bjones transfers over the maverick2 subdirectory to the new file system. The command options indicate:

  • ––partial check if the file is already there and don't re-copy it if it's already there.
  • -a archive mode, preserve permissions and ownerships
  • -z compression which would cost us some CPU cycles - possibly on login nodes at the worst case and save on bandwidth.
  • -v verbose mode
$ rsync --partial -azv /work/01234/bjones/maverick2 /work2/01234/bjones

See the rsync man page and take a deep dive into the rsync options to learn further techniques.

Please limit your rsync processes to no more than two concurrent processes.

If your rsync session fails or the data integrity for some specific files is a big concern, then rerun the "rsync" command with the "-checksum" (or "-c") option to ensure the target files are written correctly.

$ rsync -azvc /work/01234/bjones/maverick2 /work2/01234/bjones

Migrating your Files to Ranch

Follow the instructions in the Ranch User Guide to transfer your data to Ranch.

If you have access to long-term storage at other facilities, you may transfer your data to that facility using the rsync command.

Migrating Guidelines

During this migration period, thousands of TACC users will be accessing both file systems. Be aware of the following guidelines.

Don't Stress Stockyard

The TACC Global Shared File System, Stockyard, is mounted on most TACC HPC resources as the /work ($WORK) directory. This file system is accessible to all TACC users, and therefore experiences a lot of I/O activity (reading and writing to disk, opening and closing files) as users run their jobs, read and generate data including intermediate and checkpointing files. As TACC adds more users, the stress on the $WORK file system is increasing to the extent that TACC staff is now recommending new job submission guidelines in order to reduce stress and I/O on Stockyard.

TACC staff now recommends that you run your jobs out of the $SCRATCH file system instead of the global $WORK file system. To run your jobs out $SCRATCH:

  • Copy or move all job input files to $SCRATCH
  • Make sure your job script directs all output to $SCRATCH
  • Compute nodes should not access either /work or /work2.

File System Usage

Consider that $HOME and $WORK are for storage and keeping track of important items. Actual job activity, reading and writing to disk, should be offloaded to your resource's $SCRATCH file system (see Table 2. File System Usage Recommendations. You can start a job from anywhere but the actual work of the job should occur only on the $SCRATCH partition. You can save original items to $HOME or $WORK so that you can copy them over to $SCRATCH if you need to re-generate results.

File System Best Storage Practices Best Activities
$HOME cron jobs
small scripts
environment settings
compiling, editing
$WORK store software installations
original datasets that can't be reproduced
job scripts and templates
staging datasets
$SCRATCH Temporary Storage
I/O files
job files
temporary datasets
all job I/O activity

The $SCRATCH file system, as its name indicates, is a temporary storage space. Files that have not been accessed in ten days are subject to purge. Deliberately modifying file access time (using any method, tool, or program) for the purpose of circumventing purge policies is prohibited.

Limit Input/Output (I/O) Activity

  • Limit I/O intensive sessions (lots of reads and writes to disk, rapidly opening or closing many files).

  • Avoid opening and closing files repeatedly. Every open/close operation on the file system requires interaction with the MetaData Service (MDS). The MDS acts as a gatekeeper for access to files on Stockyard's file system. Overloading the MDS will affect other users on the system. If possible, open files once at the beginning of your program/workflow, then close them at the end.

  • Don't get greedy. If you know or suspect your workflow is I/O intensive, don't submit a pile of simultaneous jobs. Please limit your rsync processes to no more than two concurrent processes.