User News

Ranch Status 18 December 2017

Posted by Jacob Getz on Jan 17, 2018 4:11:44 PM

At 07:00 January 18th 2018 the Ranch environment will be taken down for 4 hours in order to address a hardware failure.

Appropriate notice will be provided when Ranch returns to production.


-TACC Team

Updated on Jan 18, 2018 9:25:11 AM

Ranch is back in Production as of 09:15 1/18/2018.


-TACC Team

Original Posting

At 07:00 January 18th 2018 the Ranch environment will be taken down for 4 hours in order to address a hardware failure.

Appropriate notice will be provided when Ranch returns to production.


-TACC Team

Stampede 2 Maintenance 16 January 2018

Posted by Jacob Getz on Jan 9, 2018 11:24:40 AM

Stampede2 will be unavailable from 8 a.m. to 7:30 p.m. (CT) on Tuesday, 16 January 2018. System maintenance will be performed during this time.   If you submit a job before the maintenance, and the time you request exceeds the time remaining until the maintenance begins, your job will run when the...

Updated on Jan 15, 2018 9:51:38 PM

The University of Texas will be closed due to inclement weather tomorrow, January 16, 2018. For this reason, the Stampede2 maintenance scheduled for this date will be rescheduled for Tuesday, January 23, 2018 from 8am to 7:30pm (CST).

Original Posting

Stampede2 will be unavailable from 8 a.m. to 7:30 p.m. (CT) on Tuesday, 16 January 2018. System maintenance will be performed during this time.

 

If you submit a job before the maintenance, and the time you request exceeds the time remaining until the maintenance begins, your job will run when the maintenance is over. The squeue command will report "ReqNodeNotAvailable" ("Required Node Not Available"). The showq utility will list the job as "BLOCKED" and report its status as "WaitNod" ("Waiting for Nodes"). Note that the hours leading up to the maintenance are an excellent time to submit shorter, smaller jobs that can complete before the maintenance begins: as the queues drain there will be many nodes available, and your wait time may be short.

Wrangler Maintenance 30 January 2018

Posted by Jacob Getz on Jan 12, 2018 1:28:23 PM

Wrangler will be undergoing system maintenance for patching/updates and unavailable for users on 1/30/18 from 0800-1700 (CST).

Wrangler will be undergoing system maintenance for patching/updates and unavailable for users on 1/30/18 from 0800-1700 (CST).

Lonestar5 Maintenance 23 January 2018

Posted by Jacob Getz on Jan 8, 2018 11:19:21 AM

LoneStar5 will not be available from 8 a.m. to 5:00 p.m. (CT) on Tuesday, 23 January 2018. System maintenance will be performed during this time.

-TACC Team

LoneStar5 will not be available from 8 a.m. to 5:00 p.m. (CT) on Tuesday, 23 January 2018. System maintenance will be performed during this time.

-TACC Team

TACC Winter Break Schedule

Posted by Chris Hempel on Dec 21, 2017 6:42:54 AM

TACC personnel will observe the University of Texas at Austin winter break from 5 p.m. (CST) on Thursday, 21 December 2017, and will resume normal business hours on Tuesday, 2 January 2018. A staff member will be on site to monitor the status of all TACC resources. TACC support staff will monitor...

TACC personnel will observe the University of Texas at Austin winter break from 5 p.m. (CST) on Thursday, 21 December 2017, and will resume normal business hours on Tuesday, 2 January 2018. A staff member will be on site to monitor the status of all TACC resources. TACC support staff will monitor the consulting system throughout the break and address critical system issues. The staff will address other issues beginning Tuesday, 2 January 2018.


Please submit any questions you may have via the TACC Consulting System.
https://portal.tacc.utexas.edu/tacc-consulting

Stampede2 Status, December 19, 2017

Posted by Jacob Getz on Dec 11, 2017 2:55:43 PM

Stampede2 will be unavailable 19 Dec 2017 between 8:00AM and 7:30PM CST for maintenance.


-TACC Team

Updated on Dec 19, 2017 7:29:33 PM

Stampede2 is now back in production. Thank you.

Updated on Dec 18, 2017 2:58:04 PM

Reminder: Stampede2 will be unavailable December 19th 2017 between 8:00AM and 7:30PM CST for maintenance.

At the end of this maintenance, the new Skylake (SKX) nodes will enter production service alongside the existing Knights Landing (KNL) nodes. SKX jobs that run after the maintenance will incur normal accounting charges. See the Stampede2 User Guide  for more information. (https://portal.tacc.utexas.edu/user-guides/stampede2)

If you submit a job before the maintenance, and the time you request exceeds the time remaining until the maintenance begins, your job will run when the maintenance is over. The squeue command will report "ReqNodeNotAvailable" ("Required Node Not Available"). The showq utility will list the job as "BLOCKED" and report its status as "WaitNod" ("Waiting for Nodes"). Note that the hours leading up to the maintenance are an excellent time to submit shorter, smaller jobs that can complete before the maintenance begins: as the queues drain there will be many nodes available, and your wait time may be short.

Original Posting

Stampede2 will be unavailable 19 Dec 2017 between 8:00AM and 7:30PM CST for maintenance.


-TACC Team

Wrangler Status 18 December 2017

Posted by Sergio Leal on Dec 18, 2017 12:15:37 PM

Wrangler system is currently unavailable due to a software error.  Administrators are currently working on resolving the issue.  There is no current ETA and we will provide further updates as they become available.  Thanks.

Updated on Dec 18, 2017 6:34:25 PM

Wrangler has been returned to production at this time. Thank you.

Original Posting

Wrangler system is currently unavailable due to a software error.  Administrators are currently working on resolving the issue.  There is no current ETA and we will provide further updates as they become available.  Thanks.

Training: Introduction to OpenMP using the Interactive Parallelization Tool (IPT)

Posted by Jason Allison on Dec 1, 2017 11:58:48 AM

December 14th, 2017 9am-1pm CT Texas Advanced Computing Center ACB 1.104 J.J. Pickle Research Campus 10100 Burnet Rd. Austin, TX 78758 OpenMP is one of the most popular paradigms to exploit the now ubiquitous manycore and multi-core processors. In this beginner-level training session, we will...

December 14th, 2017 9am-1pm CT
Texas Advanced Computing Center
ACB 1.104
J.J. Pickle Research Campus
10100 Burnet Rd. Austin, TX 78758

OpenMP is one of the most popular paradigms to exploit the now ubiquitous manycore and multi-core processors. In this beginner-level training session, we will provide an overview of the basic concepts of OpenMP. We will introduce the trainees to the Interactive Parallelization Tool (IPT) that is designed for parallelizing serial C/C++ programs semi-automatically. The participants in the training session will be introduced to OpenMP and will learn to use IPT for parallelizing their C/C++ applications.

Prerequisites: Experience working in a Linux environment, and familiarity with C/C++/Fortran or any other programming language.

We are offering the training to both in-person and webcast participants. Local participants are strongly encouraged to attend in person.

To attend the training in person, please contact me via email at jasona@tacc.utexas.edu.

To attend via webcast, please enroll for the training at:
https://learn.tacc.utexas.edu/mod/chat/view.php?id=30

You will need to sign in with your TACC User Portal account and password to enroll.

Maverick: New queues to support long gpu runs

Posted by Chris Hempel on Nov 27, 2017 4:34:48 PM

Two new queues have been configured on Maverick to accommodate GPU jobs that require more runtime than allowed in the gpu queue. These two queues are configured as follows: gpu-long - up to 72 hours runtime, one node per job (i.e. sbatch -N 1 and/or -n 20 or less) - maximum of 8 jobs allowed in...

Updated on Dec 1, 2017 8:59:34 AM

The runtime limit on the gpu queue has been increased to 24 hours.

Original Posting

Two new queues have been configured on Maverick to accommodate GPU jobs that require more runtime than allowed in the gpu queue. These two queues are configured as follows:

gpu-long
- up to 72 hours runtime, one node per job (i.e. sbatch -N 1 and/or -n 20 or less)
- maximum of 8 jobs allowed in queue per user

gpu-verylong
- up to 120 hours runtime, one node per job
- maximum of 3 jobs allowed in queue per user

These queues are available immediately for use and do not require special permission to access. The gpu queue remains with a 12-hour runtime limit.

Please submit any questions you have via the TACC Consulting System.

https://portal.tacc.utexas.edu/tacc-consulting

Ranch status 11/14/2017

Posted by David Littrell on Nov 14, 2017 10:53:26 AM

At 10:30 November 14th the Ranch environment will be taken down for 2 hours in order to address a hardware error. Notice will be provided when Ranch returns to production.

Updated on Nov 25, 2017 9:52:34 AM

The Ranch environment is up and available as of 09:09 CST 25 of November, 2017

Updated on Nov 16, 2017 11:31:26 AM

Administrators continue to work with the vendor to resolve a filesystem issue on Ranch and an update to user news will be posted once the problem has been resolved.


Updated on Nov 14, 2017 1:08:57 PM

The Ranch emergency downtime is being extended.
ETA back into production is still being determined at this time.
Appropriate notice will be provided when Ranch returns to production.

-TACC Team

Original Posting

At 10:30 November 14th the Ranch environment will be taken down for 2 hours in order to address a hardware error. Notice will be provided when Ranch returns to production.

Lonestar5 Maintenance 21 November 2017

Posted by Matthew Edeker on Nov 6, 2017 11:05:12 AM

LoneStar5 will not be available from 8 a.m. to 17:00 p.m. (CT) on Tuesday, 21 November 2017. Maintenance on the Slurm scheduler and Cray Development Toolkit will be performed during this time.



Updated on Nov 21, 2017 11:23:02 PM

LoneStar5 has been returned to production and the queues are open. Any queued slurm jobs will need to be resubmitted. Thank you for your patience.

Updated on Nov 21, 2017 6:01:37 PM

The Lonestar5 maintenance needs to be extended. At the moment we don't have a scheduled time to return to production.

Original Posting

LoneStar5 will not be available from 8 a.m. to 17:00 p.m. (CT) on Tuesday, 21 November 2017. Maintenance on the Slurm scheduler and Cray Development Toolkit will be performed during this time.



Maverick Maintenance 21 November 2017

Posted by Matthew Edeker on Nov 6, 2017 11:26:03 AM

Maverick will not be available from 8 a.m. to 17:00 p.m. (CT) on Tuesday, 21 November 2017. Maintenance on the Slurm scheduler will be performed during this time.

Updated on Nov 21, 2017 3:02:42 PM

Maverick is back in production. 

Original Posting

Maverick will not be available from 8 a.m. to 17:00 p.m. (CT) on Tuesday, 21 November 2017. Maintenance on the Slurm scheduler will be performed during this time.

Stampede 2 status, November 28, 2017

Posted by David Littrell on Nov 15, 2017 12:09:26 PM

Stampede2 will be unavailable 28 Nov 2017 between 8:00AM and 7:30PM CST for maintenance.   If you submit a job and the time you request exceeds the time remaining until the maintenance begins, your job will run when the maintenance is over. The squeue command will report "ReqNodeNotAvailable"...

Stampede2 will be unavailable 28 Nov 2017 between 8:00AM and 7:30PM CST for maintenance.
 
If you submit a job and the time you request exceeds the time remaining until the maintenance begins, your job will run when the maintenance is over. The squeue command will report "ReqNodeNotAvailable" ("Required Node Not Available"). The showq utility will list the job as "BLOCKED" and report its status as "WaitNod" ("Waiting for Nodes"). Note that the hours leading up to the maintenance are an excellent time to submit shorter, smaller jobs that can complete before the maintenance begins: as the queues drain there will be many nodes available, and your wait time may be short.

Ranch Maintenance 21 November 2017

Posted by David Littrell on Nov 9, 2017 9:50:00 AM

At 08:00 on Tuesday, November 21st, the Ranch environment will be brought down for normal system maintenance. Due to the 1.3 billion files currently in Ranch, this maintenance activity should take between 24 and 36 hours. We expect to bring Ranch back into production no sooner than 20:00 Wednesday,...

At 08:00 on Tuesday, November 21st, the Ranch environment will be brought down for normal system maintenance. Due to the 1.3 billion files currently in Ranch, this maintenance activity should take between 24 and 36 hours. We expect to bring Ranch back into production no sooner than 20:00 Wednesday, November 22nd.

Users should take note that it is possible that this downtime could extend overnight into the Thanksgiving holiday. Appropriate notice will be given should this maintenance event run longer than its expected 36 hours.

Wrangler Maintenance 7 November 2017

Posted by Matthew Edeker on Oct 23, 2017 9:27:42 AM

Wrangler will be unavailable from 7:30am to 5:30pm CST on Tuesday 7 November 2017 for system maintenance. 

Updated on Nov 7, 2017 9:49:10 PM

Wrangler has been returned to production and the queues are open. A substantial upgrade was done to Slurm scheduler. Unfortunately, this meant jobs placed in the queue prior to maintenance were lost. We will be contacting users directly to ask them to resubmit. Thank you for your patience. 

Updated on Nov 7, 2017 4:28:27 PM

Wrangler maintenance has taken longer than expected and does not currently have an ETA for completion. We apologize for any inconveniences. 

Original Posting

Wrangler will be unavailable from 7:30am to 5:30pm CST on Tuesday 7 November 2017 for system maintenance. 

Lonestar5 Status October 31, 2017

Posted by David Littrell on Oct 17, 2017 9:28:07 AM

LoneStar5 will not be available from 8 a.m. to 17:30 p.m. (CT) on Tuesday, 31 October 2017. System maintenance for installing Patches and Field notices will be performed during this time.

Updated on Oct 31, 2017 2:49:33 PM

Lonestar5 is back in production. 

Original Posting

LoneStar5 will not be available from 8 a.m. to 17:30 p.m. (CT) on Tuesday, 31 October 2017. System maintenance for installing Patches and Field notices will be performed during this time.

Stampede2 Extended Outage Begins Friday, 20 Oct 2017 at 8am CDT

Posted by Jason Allison on Oct 17, 2017 2:50:15 PM

Stampede2 will be unavailable for a four-day period beginning Friday, 20 Oct 2017 at 8 am CDT. This extended outage will allow: (1) full system science and benchmarking runs; as well as, (2) configuration, integration, and testing activities to prepare for Stampede2 “Phase 2” deployment. Phase 2...

Updated on Oct 23, 2017 4:49:37 PM

The Stampede2 system maintenance will need to be extended until at least 6PM CDT tomorrow to finish the work of integrating the phase 2 nodes into the system.


Original Posting

Stampede2 will be unavailable for a four-day period beginning Friday, 20 Oct 2017 at 8 am CDT. This extended outage will allow: (1) full system science and benchmarking runs; as well as, (2) configuration, integration, and testing activities to prepare for Stampede2 “Phase 2” deployment.

Phase 2 will feature the addition of 1,736 “Skylake” Xeon nodes to the system. Stampede2 will be completely unavailable during the maintenance window. We expect to reopen the KNL nodes to general use by 6 pm, Monday, 23 Oct 2017.  

Please submit any questions you have via the TACC User Portal:
https://portal.tacc.utexas.edu/tacc-consulting

MPI Foundations I and II - Oct 6th, 2017 - Space Available

Posted by Jason Allison on Oct 4, 2017 2:42:54 PM

We still have space available for this Friday's MPI Foundations I and II training events. Please register ASAP to avoid missing out on this opportunity. Local attendees are strongly encouraged to attend in-person. To register and for more information please visit:...

We still have space available for this Friday's MPI Foundations I and II training events. Please register ASAP to avoid missing out on this opportunity. Local attendees are strongly encouraged to attend in-person.

To register and for more information please visit: https://portal.tacc.utexas.edu/training

Lonestar5 Maintenance 26 September 2017

Posted by Sergio Leal on Sep 14, 2017 9:05:39 AM

LoneStar5 will not be available from 7:00 a.m. to 5:00 p.m. (CT) on Tuesday, 26 September 2017. We will be doing maintenance on the DDN hardware supporting the /scratch filesystem.

Updated on Sep 26, 2017 8:41:22 PM

LoneStar5 has returned to production. Login nodes and queues are open. Vlogins are still down and are planned to be restored tomorrow.



Thank you for your patience.

-TACC Team

Updated on Sep 26, 2017 5:18:12 PM

LoneStar5 is not returning to production on schedule after today's maintenance. The issues relating to /scratch filesystem were correctly resolved, however, required security patches are causing us to prolong the maintenance. We apologize for the inconvenience.

 -TACC Team

Original Posting

LoneStar5 will not be available from 7:00 a.m. to 5:00 p.m. (CT) on Tuesday, 26 September 2017. We will be doing maintenance on the DDN hardware supporting the /scratch filesystem.

Stampede 2 Maintenace 3 October 2017

Posted by Sergio Leal on Sep 26, 2017 9:28:00 AM

Stampede 2 will be unavailable 3rd of October, 2017 between 8:00AM and 7:30PM CST to perform maintenance on the system.

Stampede 2 will be unavailable 3rd of October, 2017 between 8:00AM and 7:30PM CST to perform maintenance on the system.

Lonestar5 Status September 20, 2017

Posted by David Littrell on Sep 20, 2017 1:04:16 PM

LoneStar5 has been returned to production and the queues and login nodes are available for use. We apologize for the inconvenience of having extended the maintenance window. We experienced hardware problems with Cray cabinet controllers as well as hardware issues with the /scratch shared...

LoneStar5 has been returned to production and the queues and login nodes are available for use. We apologize for the inconvenience of having extended the maintenance window. We experienced hardware problems with Cray cabinet controllers as well as hardware issues with the /scratch shared filesystem. So please note that there will be performance degradation on /scratch until raid arrays are rebuilt over the next few hours. If possible please limit your IO on /scratch.

Lonestar5 Maintenance 19 September 2017

Posted by Jacob Getz on Sep 5, 2017 3:04:46 PM

LoneStar5 will not be available from 8 a.m. to 12:00 p.m. (CT) on Tuesday, 19 September 2017. Cray patches and field notices will be installed during this maintenance.


-TACC Team

Updated on Sep 19, 2017 4:14:34 PM

Today's Lonestar5 maintenance has been extended beyond the original timeframe. There is currently not an estimated completion time. User News will be updated when more information is available.


-TACC Team

Original Posting

LoneStar5 will not be available from 8 a.m. to 12:00 p.m. (CT) on Tuesday, 19 September 2017. Cray patches and field notices will be installed during this maintenance.


-TACC Team

October 2017 TACC Training Events

Posted by Jason Allison on Sep 15, 2017 4:14:45 PM

I am pleased to announce the following training events are being offered to both in-person and webcast participants for October 2017. Local participants are strongly encouraged to attend in person. 10/4/17 - Introduction To Hadoop And Spark On Wrangler 10/6/17 - MPI Foundations I and MPI...

I am pleased to announce the following training events are being offered to both in-person and webcast participants for October 2017. Local participants are strongly encouraged to attend in person.

10/4/17 - Introduction To Hadoop And Spark On Wrangler
10/6/17 - MPI Foundations I and MPI Foundations II
10/11/17 - Introduction to Scala/Spark
10/18/17 - Data Analysis Using Hadoop/Spark

To register and for more information please visit: https://portal.tacc.utexas.edu/training

Training: Introduction to OpenMP using the Interactive Parallelization Tool (IPT)

Posted by Jason Allison on Aug 24, 2017 4:40:22 PM

September 14th, 2017 1pm-5pm CT Texas Advanced Computing Center ACB 1.104 J.J. Pickle Research Campus 10100 Burnet Rd. Austin, TX 78758 OpenMP is one of the most popular paradigms to exploit the now ubiquitous manycore and multi-core processors. In this beginner-level training session, we will...

Updated on Sep 12, 2017 2:29:55 PM

We still have spaces available for the Introduction to OpenMP using the Interactive Parallelization Tool (IPT) September 14th, 2017. Please register ASAP to avoid missing out on this opportunity.

Original Posting

September 14th, 2017 1pm-5pm CT
Texas Advanced Computing Center
ACB 1.104
J.J. Pickle Research Campus
10100 Burnet Rd. Austin, TX 78758

OpenMP is one of the most popular paradigms to exploit the now ubiquitous manycore and multi-core processors. In this beginner-level training session, we will provide an overview of the basic concepts of OpenMP. We will introduce the trainees to the Interactive Parallelization Tool (IPT) that is designed for parallelizing serial C/C++ programs semi-automatically. The participants in the training session will be introduced to OpenMP and will learn to use IPT for parallelizing their C/C++ applications.

Prerequisites: Experience working in a Linux environment, and familiarity with C/C++/Fortran or any other programming language.

We are offering the training to both in-person and webcast participants. Local participants are strongly encouraged to attend in person.

For more information and to register for either the in-person or webcast session, please visit:
https://portal.tacc.utexas.edu/training#/user?training=upcoming

Corral Maintenance 19 September 2017

Posted by Jacob Getz on Sep 5, 2017 3:06:27 PM

Corral will not be available from 9 a.m. to 2:00 p.m. (CT) on Tuesday, 19 September 2017 while system updates are performed. All Corral file systems, web and iRODS services will be unavailable. Database services will not be affected. 


-TACC Team

Updated on Sep 7, 2017 11:11:25 AM

The Corral login node login1.corral.tacc.utexas.edu will be retired during this maintenance. Users accessing Corral through the SSH/SCP hostname login1.corral.tacc.utexas.edu should update their configurations to use data.tacc.utexas.edu as the target host for SSH sessions and SCP transfers.

Original Posting

Corral will not be available from 9 a.m. to 2:00 p.m. (CT) on Tuesday, 19 September 2017 while system updates are performed. All Corral file systems, web and iRODS services will be unavailable. Database services will not be affected. 


-TACC Team

Wrangler status August 29 2017

Posted by David Littrell on Aug 23, 2017 11:59:50 AM

Wrangler will be unavailable from 8:00am to 5:00pm CST on Tuesday 29 August 2017 for system maintenance.

Updated on Aug 29, 2017 6:03:06 PM

Wrangler is back up in production as of 6:00pm CST, Tuesday 29, August 2017

Updated on Aug 29, 2017 4:31:08 PM

Maintenance on Wrangler has been extended. Currently, there is no ETA for when it will be back up in production. 

Original Posting

Wrangler will be unavailable from 8:00am to 5:00pm CST on Tuesday 29 August 2017 for system maintenance.

Ranch Status 29 August 2017

Posted by Sergio Leal on Aug 29, 2017 10:36:09 AM

Ranch had a hard reboot at 00:35 CST this morning and was partially unavailable from 00:35 until 07:00 CST.  This was due to a system error requiring a full restart of Ranch functionality.

Ranch had a hard reboot at 00:35 CST this morning and was partially unavailable from 00:35 until 07:00 CST.  This was due to a system error requiring a full restart of Ranch functionality.

POB Vislab Closed Monday September 4th, 2017

Posted by Jacob Getz on Aug 28, 2017 1:47:15 PM

The POB Vislab will not be available from 8 Monday, 4th September 2017 in observance of Labor Day. We will reopen our doors Tuesday, September 5th at 9 a.m.


-TACC Team

The POB Vislab will not be available from 8 Monday, 4th September 2017 in observance of Labor Day. We will reopen our doors Tuesday, September 5th at 9 a.m.


-TACC Team

Lonestar5 Status August 29, 2017

Posted by David Littrell on Aug 10, 2017 10:09:32 AM

LoneStar5 will not be available from 8 a.m. to 5:00 p.m. (CT) on Tuesday, 29 August 2017 due to maintenance. 

Updated on Aug 28, 2017 8:32:43 AM

The LoneStar5 maintenance scheduled for 29th Aug is being postponed and a future date is not finalized yet. Please see user news for announcements of future reservations.


-TACC Team

Original Posting

LoneStar5 will not be available from 8 a.m. to 5:00 p.m. (CT) on Tuesday, 29 August 2017 due to maintenance. 

Wrangler Maintenance 22 August 2017

Posted by Jacob Getz on Aug 22, 2017 8:25:28 AM

Wrangler will be unavailable from 8:00am to 5:00pm CST on Tuesday 22 August 2017 for system maintenance.

Updated on Aug 22, 2017 6:39:46 PM

Wrangler has been returned to production as of 6:30pm CST. 

Updated on Aug 22, 2017 4:56:06 PM

Maintenance on Wrangler has taken longer than expected and we hope to have it available by 8:00pm CST on Tuesday 22 August 2017. 

Original Posting

Wrangler will be unavailable from 8:00am to 5:00pm CST on Tuesday 22 August 2017 for system maintenance.

Wrangler Status August 15, 2017

Posted by David Littrell on Aug 9, 2017 11:34:54 AM

We will be having a Wrangler maintenance 8/15 @ 8am-5pm to complete some work on fixing a degraded OSS on the /data Lustre filesystem with Dell On-Site support.

Updated on Aug 15, 2017 3:35:45 PM

Today's Wrangler maintenance has concluded.


-TACC Team

Original Posting

We will be having a Wrangler maintenance 8/15 @ 8am-5pm to complete some work on fixing a degraded OSS on the /data Lustre filesystem with Dell On-Site support.

New Corral Login Node

Posted by Jason Allison on Aug 9, 2017 11:10:15 AM

The Corral login node, accessed as corral.tacc.utexas.edu or login1.corral.tacc.utexas.edu, will soon be retired and replaced with general-purpose “data transfer nodes”, all accessed through the alias data.tacc.utexas.edu. Alongside this transition, the new nodes will require the use of TACC’s...

The Corral login node, accessed as corral.tacc.utexas.edu or login1.corral.tacc.utexas.edu, will soon be retired and replaced with general-purpose “data transfer nodes”, all accessed through the alias data.tacc.utexas.edu. Alongside this transition, the new nodes will require the use of TACC’s multi-factor authentication tokens ( https://portal.tacc.utexas.edu/tutorials/multifactor-authentication).

This transition will allow us to better serve our user community with improved performance, reliability, scaling and security.

The new data transfer nodes are available now for testing, and provide direct access to both the Corral 3 file systems as well as the Stockyard/work file system. These nodes have all the most common file transfer facilities installed, including the iRODS client “i-Commands" and the Globus GridFTP client tools.

If you have automated transfers or otherwise perform large-scale data transfer through the current Corral login node you should test your workflows using the new data transfer nodes as soon as possible.

Note that no data migration from the /home areas of the Corral login node will be performed since those areas are not intended for any significant data storage. If you have configuration or other files stored in $HOME on corral.tacc.utexas.edu then please migrate these files to the data transfer node or another resource in a timely manner to ensure you do not lose any data.

A follow-up notification will be sent in the coming weeks announcing the retirement date for the current Corral login node. Please ensure that you have tested any workflows relying on the current Corral login node with the new infrastructure prior to the final shutdown date.

Please submit any questions you may have via the TACC User Portal.

https://portal.tacc.utexas.edu/tacc-consulting

Maverick Status 10-12 August 2017

Posted by Sergio Leal on Aug 8, 2017 2:40:20 PM

Maverick is reserved for 48 hours beginning Thursday, 8/10/2017 at 8 AM CDT to support a large-scale, real-time simulation project. During that time, no new jobs will start, but login nodes and filesystems will remain available. Also, jobs that cannot run during this 48-hour window will be held in...

Maverick is reserved for 48 hours beginning Thursday, 8/10/2017 at 8 AM CDT to support a large-scale, real-time simulation project. During that time, no new jobs will start, but login nodes and filesystems will remain available. Also, jobs that cannot run during this 48-hour window will be held in queue until the system resumes production, when they will become eligible to run. Maverick is scheduled to resume normal operations at 8 AM CDT on Saturday, 8/12/2017.


Please submit any questions you may have via the TACC User Portal

Ranch Maintenance 22 August 2017

Posted by Sergio Leal on Aug 7, 2017 9:44:59 AM

Ranch will be unavailable from 8:00am to 8:00pm CST on Tuesday 22 August 2017 for system maintenance.

Ranch will be unavailable from 8:00am to 8:00pm CST on Tuesday 22 August 2017 for system maintenance.

Corral Downtime

Posted by Garland Whiteside on Aug 4, 2017 7:59:30 AM

Corral is down due to hardware issues.  The administrators are working on it and, will have it back as soon as possible.


Thanks,

TACC Team

Updated on Aug 4, 2017 10:44:14 AM

Filesystem has been restored and Corral is back in production.  Please submit a ticket if you are experiencing any problems with any Corral mounts.

Original Posting

Corral is down due to hardware issues.  The administrators are working on it and, will have it back as soon as possible.


Thanks,

TACC Team

Stampede 2 status August 3, 2017

Posted by David Littrell on Aug 3, 2017 10:05:20 AM

The Stampede 2 /scratch filesystem currently has one storage target unavailable which could cause hangs or I/O errors if users try to access files on those targets.  Administrators are working to restore the target and an update to user news will be posted once the problem has been resolved.

Updated on Aug 3, 2017 10:35:56 AM

Stampede2's /scratch filesystem is back in full production.

Original Posting

The Stampede 2 /scratch filesystem currently has one storage target unavailable which could cause hangs or I/O errors if users try to access files on those targets.  Administrators are working to restore the target and an update to user news will be posted once the problem has been resolved.

Lonestar5 Maintenance 1 August 2017

Posted by Sergio Leal on Jul 23, 2017 9:27:18 PM

Lonestar 5 will not be available from 8:00 a.m. to 5:00 p.m. (CT) on Tuesday, 1 August 2017. System maintenance will be performed during this time

Updated on Aug 2, 2017 2:50:04 PM

We are presently testing a resolution to the problems occurring between the external network (e.g. internet, license servers), and LS5 compute nodes. Please let us know if any further issues transpire. Thank you very much for your patience during this extended maintenance period.  

Updated on Aug 1, 2017 8:07:50 PM

LS5 has been returned to production. We apologize for the delay. Unfortunately the rsip service error was not resolvable within a reasonable time-frame and so the rsip errors persist. This means that some compute nodes are still unable to access outside IP addresses. We are able to make some changes towards resolving this while the system is running so please watch user news for updates to this service

Updated on Aug 1, 2017 5:08:32 PM

LS5 is still experiencing issues relating to the rsip server setup. We estimate that this can be solved within a reasonable amount of time and so are extending the LS5 until 23:00.

Original Posting

Lonestar 5 will not be available from 8:00 a.m. to 5:00 p.m. (CT) on Tuesday, 1 August 2017. System maintenance will be performed during this time

Lonestar 5 Status July 25, 2017

Posted by Lucas Nopoulos on Jul 25, 2017 3:36:58 PM

Lonestar 5 has an issue with the scheduler at present - it will be resolved as soon as possible.

-TACC Team

Updated on Jul 25, 2017 3:43:14 PM

This issue has now been resolved and that running and queued jobs were unaffected.

-TACC Team

Original Posting

Lonestar 5 has an issue with the scheduler at present - it will be resolved as soon as possible.

-TACC Team

Lonestar5 status, 20 July 2017

Posted by David Littrell on Jul 20, 2017 10:50:54 AM

LoneStar5 maintenance is being extended due to technical difficulties with the upgrade process. Please see user news for updates on when they system will be returned to production.

Updated on Jul 20, 2017 3:08:32 PM

LoneStar5 has been returned to production. The queues are available; however, Singularity is not available due to kernel updates. This can and will be resolved while the system is operational and an announcement will be made when it has be restored. 

Thank you, 

TACC team

Original Posting

LoneStar5 maintenance is being extended due to technical difficulties with the upgrade process. Please see user news for updates on when they system will be returned to production.

Stampede 2 status, July 19, 2017

Posted by David Littrell on Jul 19, 2017 10:37:17 AM

The Stampede2 system is back available to users, however, the /work filesystem remains offline for system maintenance.  To prevent queued jobs from failing, all current pending user jobs were held.  If your jobs do not require the /work filesystem, use the command "scontrol release " with...

Updated on Jul 19, 2017 6:23:35 PM

The /work filesystem has been restored on all TACC production systems.   All held user jobs have been released from the hold and should be running or queued to run now.
 
Thank you, 
 
-TACC

Original Posting

The Stampede2 system is back available to users, however, the /work filesystem remains offline for system maintenance.  To prevent queued jobs from failing, all current pending user jobs were held.  If your jobs do not require the /work filesystem, use the command "scontrol release <jobid>" with your specific job numbers in place of <jobid>

Stampede 1 status, July 19, 2017

Posted by David Littrell on Jul 19, 2017 12:08:08 PM

The Stampede1 system is back available to users, however, the /work filesystem remains offline for system maintenance.  To prevent queued jobs from failing, all current pending user jobs were held.  If your jobs do not require the /work filesystem, use the command "scontrol release " with...

Updated on Jul 19, 2017 6:22:39 PM

The /work filesystem has been restored on all TACC production systems.   All held user jobs have been released from the hold and should be running or queued to run now.
 
Thank you,

 - TACC

Original Posting

The Stampede1 system is back available to users, however, the /work filesystem remains offline for system maintenance.  To prevent queued jobs from failing, all current pending user jobs were held.  If your jobs do not require the /work filesystem, use the command "scontrol release <jobid>" with your specific job numbers in place of <jobid>

Ranch status 19 July 2017

Posted by David Littrell on Jul 19, 2017 12:51:33 PM

TACC’s Ranch system is back in production after yesterday’s maintenance as of 12:30PM CST on 7/19/17.

TACC’s Ranch system is back in production after yesterday’s maintenance as of 12:30PM CST on 7/19/17.