Stampede2 User News

TACC Network Maintenance Sunday 19 July 2020

Posted by Dean Nobles on Jul 9, 2020 11:09:40 AM

Access to all TACC systems will be unavailable on Sunday 19 Jul 2020 from 10:00 AM CDT until 4:00 PM CDT.  This is to allow for upgrades to the TACC core network hardware. Jobs will continue to run, but users will have no access to TACC services and systems until the upgrade is complete. Please...

Access to all TACC systems will be unavailable on Sunday 19 Jul 2020 from 10:00 AM CDT until 4:00 PM CDT.  This is to allow for upgrades to the TACC core network hardware. Jobs will continue to run, but users will have no access to TACC services and systems until the upgrade is complete.

Please submit any questions you may have via the TACC User Portal.

https://portal.tacc.utexas.edu/tacc-consulting

Stampede2 Maintenance Tuesday 21 July 2020

Posted by Mark Brueschke on Jul 7, 2020 11:02:58 AM

Stampede2 will not be available from 8:00 AM to 7:30 PM CDT on Tuesday, 21 July 2020. System maintenance will be performed during this time.

Please submit any questions you may have via the TACC User Portal.
https://portal.tacc.utexas.edu/tacc-consulting

Stampede2 will not be available from 8:00 AM to 7:30 PM CDT on Tuesday, 21 July 2020. System maintenance will be performed during this time.

Please submit any questions you may have via the TACC User Portal.
https://portal.tacc.utexas.edu/tacc-consulting

TACC User Portal Status Friday 26 June 2020

Posted by Matthew Edeker on Jun 26, 2020 4:23:08 PM

We are currently experiencing an issue regarding user responses to tickets via the TACC User Portal. If you have replied to a ticket via the TUP in the last week, we may not have seen your response. As such, those tickets have not received additional assistance.    We are still working to resolve...

Updated on Jul 2, 2020 11:37:15 AM

The issue with ticket responses in the TACC User Portal has been resolved. You may now reply to tickets via the portal as normal.

Original Posting

We are currently experiencing an issue regarding user responses to tickets via the TACC User Portal. If you have replied to a ticket via the TUP in the last week, we may not have seen your response. As such, those tickets have not received additional assistance. 

 

We are still working to resolve the issue with the TUP and will provide an update once it is fixed. 

 

In the meantime, please reply to your tickets via the email notification that you receive from the portal. All new tickets and all replies sent via email are reaching the ticketing system without issue. 

 

This problem affects only the TACC User Portal and NOT the XSEDE User Portal. 

Ticketing emergency maintenance Monday 29 June 2020

Posted by Mark Brueschke on Jun 29, 2020 12:20:06 PM

In order to address the ongoing issues with tickets in the TACC User Portal, the ticket system will be going offline for emergency maintenance at 1:00 PM CDT on Monday, 29 June 2020. You will be unable to submit or alter tickets at that time. We will update this post once the ticket system is back...

Updated on Jun 29, 2020 2:26:25 PM

Emergency maintenance has been completed on the TACC User Portal as of 2:20 PM CDT on Monday, 29 June 2020.

Original Posting

In order to address the ongoing issues with tickets in the TACC User Portal, the ticket system will be going offline for emergency maintenance at 1:00 PM CDT on Monday, 29 June 2020. You will be unable to submit or alter tickets at that time. We will update this post once the ticket system is back online.

Stampede2 Outage Saturday 20 June 2020

Posted by Mark Brueschke on Jun 20, 2020 8:01:21 AM

The TACC machine room experienced a voltage sag power event due to a thunderstorm earlier this morning at about 6:30 AM CDT that caused many compute nodes to power off and impacted the /scratch filesystem on Stampede2.  System administrators are working to restore the system and will update this...

Updated on Jun 20, 2020 10:57:46 AM

Stampede2 has been restored to full production

Original Posting

The TACC machine room experienced a voltage sag power event due to a thunderstorm earlier this morning at about 6:30 AM CDT that caused many compute nodes to power off and impacted the /scratch filesystem on Stampede2.  System administrators are working to restore the system and will update this announcement once full production is restored.

TACC Network Maintenance 21 June 2020 - CANCELLED

Posted by Dean Nobles on Jun 4, 2020 11:13:15 AM

Access to all TACC systems will be unavailable on 14 Jun 2020 from 10:00 AM CDT until 2:00 PM CDT.  This is to allow for upgrades to the TACC core network hardware. Jobs will continue to run, but users will have no access to TACC services and systems until the upgrade is complete. Please submit any...

Updated on Jun 18, 2020 8:39:59 AM

Please note that the network maintenance for Sunday, June 21st has been cancelled for now. It will be re-scheduled at a later date.

Updated on Jun 11, 2020 9:36:40 AM

Please note; this network service interruption will occur on June 21, not June 14

Updated on Jun 4, 2020 11:35:40 AM

Please note; this network service interruption will occur on June 14, not June 13.

Original Posting

Access to all TACC systems will be unavailable on 14 Jun 2020 from 10:00 AM CDT until 2:00 PM CDT.  This is to allow for upgrades to the TACC core network hardware. Jobs will continue to run, but users will have no access to TACC services and systems until the upgrade is complete.

Please submit any questions you may have via the TACC User Portal.

https://portal.tacc.utexas.edu/tacc-consulting

TACC Network Maintenance 10 May 2020

Posted by Mark Brueschke on May 8, 2020 12:04:01 PM

Access to all TACC systems will be unavailable on 10 May 2020 from 10:00 AM CDT until 2:00 PM CDT.  This is to allow for upgrades to the TACC core network hardware. Jobs will continue to run, but users will have no access to TACC services and systems until the upgrade is complete. Please submit any...

Updated on May 10, 2020 11:01:58 AM

The TACC network maintenance is now complete.

Original Posting

Access to all TACC systems will be unavailable on 10 May 2020 from 10:00 AM CDT until 2:00 PM CDT.  This is to allow for upgrades to the TACC core network hardware. Jobs will continue to run, but users will have no access to TACC services and systems until the upgrade is complete.

Please submit any questions you may have via the TACC User Portal.

https://portal.tacc.utexas.edu/tacc-consulting

TACC /work filesystem Wednesday 25 March 2020

Posted by Sergio Leal on Mar 25, 2020 11:57:10 AM

Queues are currently closed on all machines that have /work mounted as we perform work on the filesystem. We will post an update once queues have re-opened.

Updated on Mar 25, 2020 12:16:42 PM

The /work filesystem has stabilized and queues have been re-opened.

Original Posting

Queues are currently closed on all machines that have /work mounted as we perform work on the filesystem. We will post an update once queues have re-opened.

TACC /work filesystem UPDATE Monday 23 March 2020

Posted by Mark Brueschke on Mar 23, 2020 1:03:35 PM

TACC experienced issues with the /work file system both this weekend and this morning that could have resulted in hang or slow response when accessing files. This resulted in slow operations on the login nodes as well as closed queues to work on the storage system.

TACC experienced issues with the /work file system both this weekend and this morning that could have resulted in hang or slow response when accessing files. This resulted in slow operations on the login nodes as well as closed queues to work on the storage system.

Stampede2 Maintenance Rescheduled to 17 March 2020

Posted by Greg Umbay on Feb 25, 2020 11:55:56 AM

Stampede2 Maintenance for Tuesday 3 March 2020 has been rescheduled to Tuesday, 17 March 2020.    Stampede2 will not be available from 8 a.m. to 7:30 p.m. (CT) on Tuesday, 17 March 2020.     System maintenance will be performed during this time.   Please submit any questions you may have via the...

Updated on Mar 17, 2020 6:55:11 PM

Stampede2 is back in production. 

Original Posting

Stampede2 Maintenance for Tuesday 3 March 2020 has been rescheduled to Tuesday, 17 March 2020.   
Stampede2 will not be available from 8 a.m. to 7:30 p.m. (CT) on Tuesday, 17 March 2020.    
System maintenance will be performed during this time.  
Please submit any questions you may have via the TACC User Portal.  
https://portal.tacc.utexas.edu/tacc-consulting

TACC Network Maintenance 15 March, 2020

Posted by Mark Brueschke on Mar 10, 2020 2:47:35 PM

Access to all TACC systems will be unavailable on 15 March, 2020 from 9:00 AM CDT until 12:00 PM CDT.  This is to allow for upgrades to the TACC core network hardware. Jobs will continue to run, but users will have no access to TACC services and systems until the upgrade is complete. Please submit...

Updated on Mar 15, 2020 2:57:55 PM

The TACC Network Maintenance is now complete.

Original Posting

Access to all TACC systems will be unavailable on 15 March, 2020 from 9:00 AM CDT until 12:00 PM CDT.  This is to allow for upgrades to the TACC core network hardware. Jobs will continue to run, but users will have no access to TACC services and systems until the upgrade is complete.

Please submit any questions you may have via the TACC User Portal.

https://portal.tacc.utexas.edu/tacc-consulting

TACC Services Expected to Continue as Normal

Posted by Tim Cockerill on Mar 12, 2020 11:53:56 AM

Plans are in place to continue providing normal operations. We don’t anticipate any interruption in the availability of computing resources or user support staff due to the COVID-19 situation. If you have questions, please submit them via the TACC User Portal...

Plans are in place to continue providing normal operations. We don’t anticipate any interruption in the availability of computing resources or user support staff due to the COVID-19 situation.


If you have questions, please submit them via the TACC User Portal https://portal.tacc.utexas.edu/tacc-consulting

Stampede2 /scratch UPDATE Tuesday 3 March 2020

Posted by Greg Umbay on Mar 3, 2020 11:00:08 AM

Stampede2 /scratch UPDATE

The Stampede2 /scratch filesystem has been restored.  Please contact the help desk if you encounter any errors accessing your files from this point on.

Please submit any questions you may have via the TACC User Portal. 

https://portal.tacc.utexas.edu/tacc-consulting

Stampede2 /scratch UPDATE

The Stampede2 /scratch filesystem has been restored.  Please contact the help desk if you encounter any errors accessing your files from this point on.

Please submit any questions you may have via the TACC User Portal. 

https://portal.tacc.utexas.edu/tacc-consulting

Stampede2 /scratch Update Monday 2 March 2020

Posted by Greg Umbay on Mar 2, 2020 12:09:44 PM

Stampede2 /scratch Update Monday 2 March 2020The Stampede2 admins are working diligently to recover the /scratch filesystem. This recovery is expected to take at least one more day. We will provide an update after the filesystem has been recovered. Below is a repeat of the notice that went out on...

Stampede2 /scratch Update Monday 2 March 2020The Stampede2 admins are working diligently to recover the /scratch filesystem. This recovery is expected to take at least one more day. We will provide an update after the filesystem has been recovered. Below is a repeat of the notice that went out on Thursday afternoon.TACC staff have deactivated the affected Lustre storage target on all the logins and compute nodes to avoid continued hangs when trying to access files residing on the offline storage target.This should impact less than 2% of the files in the /scratch filesystem, however, any attempt to read or write to a file on this target will result in an I/O error reporting "Cannot send after transport endpoint shutdown" and a listing of the problem files will show question marks for the permissions/ownership/size/date listing like this:# ls -l-????????? ? ?  ?      ?     ? job.1536_32NUsers should check their pending jobs to ensure that any files being read to or written to for that job are available in the filesystem without the above error message. Users will not encounter errors if creating new files on the filesystem. Also, if submitting new jobs, any input/output files or executables needed by that job (including shared
libraries) should be checked to confirm they are available before submitting a new job. The Slurm partitions/queues will be reopened at 2:00PM US Central time to allow jobs to run.TACC system administrators are continuing to work with the filesystem vendor to restore the offline storage target as soon as possible, User News will be updated with additional status updates.Please submit any questions you may have via the TACC User Portal.https://portal.tacc.utexas.edu/tacc-consulting

Stampede2 /scratch Update Thursday 27 February 2020

Posted by Greg Umbay on Feb 27, 2020 1:50:02 PM

The Stampede2 /scratch filesystem recovery work is anticipated to take another 2-3 days before the filesystem is completely available again. In the meantime, TACC staff have deactivated the affected Lustre storage target on all the logins and compute nodes to avoid continued hangs when trying to...

The Stampede2 /scratch filesystem recovery work is anticipated to take another 2-3 days before the filesystem is completely available again. In the meantime, TACC staff have deactivated the affected Lustre storage target on all the logins and compute nodes to avoid continued hangs when trying to access files residing on the offline storage target. 

 

This should impact less than 2% of the files in the /scratch filesystem, however, any attempt to read or write to a file on this target will result in an I/O error reporting “Cannot send after transport endpoint shutdown” and a listing of the problem files will show question marks for the permissions/ownership/size/date listing like this:

# ls -l

-????????? ? ?   ?       ?      ? job.1536_32N

 

Users should check their pending jobs to ensure that any files being read to or written to for that job are available in the filesystem without the above error message. Users will not encounter errors if creating new files on the filesystem. Also, if submitting new jobs, any input/output files or executables needed by that job (including shared libraries) should be checked to confirm they are available before submitting a new job. The Slurm partitions/queues will be reopened at 2:00PM US Central time to allow jobs to run.

 

TACC system administrators are continuing to work with the filesystem vendor to restore the offline storage target as soon as possible, User News will be updated with additional status updates.

 Please submit any questions you may have via the TACC User Portal.

https://portal.tacc.utexas.edu/tacc-consulting

Stampede2 Status Wednesday 26 February 2020

Posted by Greg Umbay on Feb 26, 2020 10:34:34 AM

Stampede2 is currently experiencing issues with the /scratch file system. User access and new job submissions may be affected.  We are currently investigating.   An update will be sent when we have resolved the issue.  Please submit any questions you may have via the TACC User Portal.  ...

Updated on Feb 26, 2020 8:22:50 PM

We are continuing to work on our Stampede2 scratch filesystem issues with our vendor and queues will remain closed for the time being. We will provide any updates as they become available and expect work to continue through the evening.

Please submit any questions you may have via the TACC User Portal.

https://portal.tacc.utexas.edu/tacc-consulting

Updated on Feb 26, 2020 2:29:52 PM

TACC staff continue to work on one of the storage servers for the Stampede2 /scratch filesystem, the Slurm partitions/queues will remain closed and users will encounter a hang if trying to access a directory or file that resides on this one server. An update to this user news will be made once the filesystem is restored to normal operation.


Please submit any questions you may have via the TACC User Portal.

https://portal.tacc.utexas.edu/tacc-consulting


Original Posting

Stampede2 is currently experiencing issues with the /scratch file system. User access and new job submissions may be affected.  We are currently investigating.   An update will be sent when we have resolved the issue.  Please submit any questions you may have via the TACC User Portal.  


BOINC@TACC Training 20 March 2020 1pm-3pm CT

Posted by Jason Allison on Feb 26, 2020 12:37:11 PM

We are pleased to announce the BOINC@TACC Training being held on March 20th, 2020 from 1pm to 3pm CT. The BOINC@TACC project is based on the Volunteer Computing (VC) model. It helps researchers in running applications from a wide range of scientific domains on laptops, desktops, tablets, or...

We are pleased to announce the BOINC@TACC Training being held on March 20th, 2020 from 1pm to 3pm CT.

The BOINC@TACC project is based on the Volunteer Computing (VC) model. It helps researchers in running applications from a wide range of scientific domains on laptops, desktops, tablets, or cloud-based Virtual Machines (VMs) owned by volunteers. With BOINC@TACC, students and researchers can run small high-throughput computing jobs (that is, the jobs that involve small amounts of data transfer and processing), without spending their active project allocations. 

Participants may attend in person at TACC or remotely by webcast. Participants in the Austin area are strongly encouraged to attend in person. 

To register for each session and for more information please visit: https://learn.tacc.utexas.edu/

Please contact me at jasona@tacc.utexas.edu if you have any questions.

Stampede2 Maintenance 3 March 2020

Posted by Greg Umbay on Feb 14, 2020 9:20:01 AM

Stampede2 maintenance announcement.

Stampede2 will not be available from 8 a.m. to 7:30 p.m. (CT) on Tuesday, 03 March 2020. System maintenance will be performed during this time.

Please submit any questions you may have via the TACC User Portal.  https://portal.tacc.utexas.edu/tacc-consulting


Updated on Feb 25, 2020 11:53:57 AM

Stampede2 Maintenance for Tuesday 3 March 2020 has been rescheduled to Tuesday, 17 March 2020.  
Stampede2 will not be available from 8 a.m. to 7:30 p.m. (CT) on Tuesday, 17 March 2020.  System maintenance will be performed during this time.  

Please submit any questions you may have via the TACC User Portal.  
https://portal.tacc.utexas.edu/tacc-consulting

Original Posting

Stampede2 maintenance announcement.

Stampede2 will not be available from 8 a.m. to 7:30 p.m. (CT) on Tuesday, 03 March 2020. System maintenance will be performed during this time.

Please submit any questions you may have via the TACC User Portal.  https://portal.tacc.utexas.edu/tacc-consulting


HPC Application Tutorial: VASP on Frontera and Stampede2 - 12 March, 2020 1pm-4pm CT

Posted by Jason Allison on Feb 18, 2020 9:43:47 AM

We are pleased to announce the HPC Application Tutorial: VASP on Frontera and Stampede2 Training being held on March 12th, 2020 from 1pm to 4pm CT. This new training module helps beginner and intermediate users of the VASP software package to carry out calculations with desirable performance and...

We are pleased to announce the HPC Application Tutorial: VASP on Frontera and Stampede2 Training being held on March 12th, 2020 from 1pm to 4pm CT. This new training module helps beginner and intermediate users of the VASP software package to carry out calculations with desirable performance and efficient usage of resources on TACC systems. It covers basic procedures for running VASP on TACC systems like Frontera and Stampede2, including:

- how to use a TACC supercomputer,
- how to gain access to the TACC installed VASP software environment,
- how to build VASP in licensed user's accounts for debugging, and
- how to build VASP in licensed user’s accounts with 3rd-party extensions.

The training also introduces the general steps for determining the task topology for optimal performance and scaling of VASP jobs under the TACC computing environment. Furthermore, we present and explain some of the most common issues and solutions to help users save effort in troubleshooting VASP related errors.

This training focuses on the system and computing environment aspects and how to run VASP efficiently on TACC systems. It is not a tutorial for researchers on how to use VASP to solve science problems. Users need to refer VASP manual and forum for that purpose.

This training is being made available to attendees via a live webcast. Registration for this event closes at 5pm CT on March 10th, 2020. To register and for more information please visit:

Please contact me at jasona@tacc.utexas.edu if you have any questions.

Containers @ TACC - 6 March, 2020 9am-3pm

Posted by Jason Allison on Feb 18, 2020 8:12:55 AM

We are pleased to announce the Containers @ TACC Training being held on March 6th, 2020 from 9am to 3pm CT. Software containers are an important common currency for portable and reproducible computing.  Learn best practices on building, using, and sharing Docker and Singularity containers in this...

We are pleased to announce the Containers @ TACC Training being held on March 6th, 2020 from 9am to 3pm CT.

Software containers are an important common currency for portable and reproducible computing.  Learn best practices on building, using, and sharing Docker and Singularity containers in this hands-on workshop.  Also learn how to run those containers on TACC HPC systems, including MPI and GPU aware containers.

This training is available to both in-person and remote attendees via webcast. Local participants are strongly encouraged to attend in-person.

Registration for this event closes at 5pm CT on March 4th, 2020. To register and for more information please visit:

Please email jasona@tacc.utexas.edu if you have any questions.

Stampede2 status 30 January 2020

Posted by Greg Umbay on Jan 30, 2020 11:43:14 AM

Stampede2 is experiencing an outage of the /scratch file system.  Slow response times and I/O issues may be experienced.  The system administrators are actively working to bring the file system back up.  Updates will be posted as the state of the file system changes. Please submit any questions you...

Updated on Jan 30, 2020 12:18:12 PM

The Stampede2 /scratch file system is back up and running.  All queues are now open.

Please submit any questions you may have via the TACC User Portal.
https://portal.tacc.utexas.edu/tacc-consulting



Original Posting

Stampede2 is experiencing an outage of the /scratch file system.  Slow response times and I/O issues may be experienced.  The system administrators are actively working to bring the file system back up.  Updates will be posted as the state of the file system changes.


Please submit any questions you may have via the TACC User Portal.
https://portal.tacc.utexas.edu/tacc-consulting

Stampede2 Status 29 January 2020

Posted by Matthew Edeker on Jan 29, 2020 2:25:54 PM

Stampede2 /scratch is currently experiencing a partial outage. Some users may see intermittent or no access to the /scratch directory. This will also prevent jobs from being submitted. Administrators are currently working to correct the issue and restore full functionality to the file system. We...

Updated on Jan 29, 2020 3:41:41 PM

The issue with Stampede2 /scratch has been resolved. All queues are open and full operations have resumed.Please submit any questions you may have via the TACC User Portal.
https://portal.tacc.utexas.edu/tacc-consulting

Original Posting

Stampede2 /scratch is currently experiencing a partial outage. Some users may see intermittent or no access to the /scratch directory. This will also prevent jobs from being submitted. Administrators are currently working to correct the issue and restore full functionality to the file system. We will update User News once the issue is resolved.


Please submit any questions you may have via the TACC User Portal.

Matlab license server maintenance 28 January 2020

Posted by Greg Umbay on Jan 28, 2020 10:58:22 AM

Matlab license server will be unavailable on 28 January 2020 between 5:00 PM and 6:00 PM CT.  
Systems affected: Frontera, Stampede2, Lonestar5, Wrangler, Maverick2 

Please submit any questions you may have via the TACC User Portal.
https://portal.tacc.utexas.edu/tacc-consulting

Updated on Jan 28, 2020 5:28:48 PM

The maintenance for the Matlab license server has been completed.

Original Posting

Matlab license server will be unavailable on 28 January 2020 between 5:00 PM and 6:00 PM CT.  
Systems affected: Frontera, Stampede2, Lonestar5, Wrangler, Maverick2 

Please submit any questions you may have via the TACC User Portal.
https://portal.tacc.utexas.edu/tacc-consulting