Message boards : CMS Application : Stageout failures
Message board moderation

To post messages, you must log in.

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1005
Credit: 6,269,607
RAC: 387
Message 28072 - Posted: 6 Dec 2016, 20:22:59 UTC
Last modified: 6 Dec 2016, 20:28:50 UTC

We have just started having a large number of jobs fail stage-out. The error messages suggest that a host is down, or possibly a certificate expired. Suggest suspending the project, putting hosts to "No New Tasks", or deselecting CMS, to avoid wasting cycles, until the problem is identified and fixed.
ID: 28072 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1005
Credit: 6,269,607
RAC: 387
Message 28073 - Posted: 6 Dec 2016, 21:45:05 UTC - in response to Message 28072.  

Laurence reports:
An auto update of the host upgraded about 271 packages and some configuration files were altered. The machine has now been reconfigured and the problem is fixed. Puppet, our configuration management tool, has been enabled on this host so this should be automatically done after any upgrade in the future.

The log files on the Condor server show normal behaviour again, so please resume CMS tasks.[/quote]
ID: 28073 · Report as offensive     Reply Quote

Message boards : CMS Application : Stageout failures


©2024 CERN