CDTools at TACC
Last update: May 13, 2021

Leveraging the /tmp space can effectively minimize the I/O load on the global Lustre file system and can also improve the performance of I/O work. Due to its limited size, the /tmp space is appropriate for executables/binaries, frequently-used object files, and small size common files, e.g. the global configuration files or the initial/pre-processed data files.

Collect-Distribute (CDTools) has been designed and developed to distribute files or directories to or from the /tmp directory. In CDTools, "collect.bash" can be used to copy/clone the binaries and frequently accessed input files to the local /tmp space on each compute node when a job starts, and "distribute.bash'' can be used to collect output files and log files back to $WORK or $SCRATCH before a job finishes.

  1. To set up CDTools on Frontera or Stampede2 system:

     $ export CDTools=/home1/apps/CDTools/1.1  # for Frontera or Stampede2
     $ export PATH=${PATH}:${CDTools}/bin
  2. Distribute your files or directories to the local /tmp space of each compute node employed for your job:

    $ distribute.bash ${SCRATCH}/inputfile #put the full path of your input file here

    or

    $ distribute.bash ${SCRATCH}/inputdir #put the full path of the directory of your input files here 

    If you ssh to those compute nodes after running the above command, you would find an identical copy of your input file or directory in the /tmp directory on each node.

  3. Collect your output files or directories generated by your job from the /tmp space of each node:

    $ collect.bash /tmp/outputdir ${SCRATCH}/output_collected

    or

    $ collect.bash /tmp/outputfile ${SCRATCH}/output_collected

You will obtain a list of output files or directories copied back to your target directories in $SCRATCH. These output files or directories have been appended with an underscore and a number that indicates the rank of compute nodes. For example, outputfile_0-3 would be found under the /output_collected directory if 4 nodes were used to run the job.

Notes

  • This tool should work for both batch mode and interactive mode. An example job script can be found in ${CDTools}/test.
  • When using the tool, users should test their workflow with CDTools before any productive runs to make sure required files are successfully distributed and collected.
  • Users should still understand and respect the /tmp limit and other IO rules when using it.