Message boards : CMS Application : CMS tasks failing
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1134
Credit: 11,579,777
RAC: 14,842
Message 52861 - Posted: 20 Jan 2026, 15:45:16 UTC - in response to Message 52860.  

Counting up the job failures in the current workflow, the 5 users with highest counts (anonymised due to GDPR) are:
      3 614757
     13 xxxxxx
     56 yyyyy
    109 zzzzzz
    426 nnnnnn

ID: 52861 · Report as offensive     Reply Quote
Garrulus glandarius

Send message
Joined: 5 Apr 25
Posts: 66
Credit: 1,784,447
RAC: 13,818
Message 52909 - Posted: 28 Jan 2026, 12:56:12 UTC
Last modified: 28 Jan 2026, 13:16:07 UTC

Got a bunch that failed again the past couple of hours, switched over to ATLAS for the time being

Actually, it seems to be in perfect correlation with the number of jobs constantly dropping since 10:24 UTC
ID: 52909 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1941
Credit: 156,018,218
RAC: 107,484
Message 52915 - Posted: 28 Jan 2026, 16:34:57 UTC

I just noticed that since about noontime of today most of the tasks failed after various periods of time (between a few minutes and half an hour).
In many cases, stderr says that filename or extension is too long:

Der Dateiname oder die Erweiterung ist zu lang.
(0xce) - exit code 206 (0xce)</message>

what's going on?
ID: 52915 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1134
Credit: 11,579,777
RAC: 14,842
Message 52917 - Posted: 28 Jan 2026, 18:54:00 UTC - in response to Message 52915.  

In reply to Erich56's message of 28 Jan 2026:
I just noticed that since about noontime of today most of the tasks failed after various periods of time (between a few minutes and half an hour).
In many cases, stderr says that filename or extension is too long:

Der Dateiname oder die Erweiterung ist zu lang.
(0xce) - exit code 206 (0xce)

what's going on?

Yes, I've noticed the jobs falling over the last hours. Not sure what's going on yet, that's not a common error message I'm tempted to believe it's coming from the operating system, but more digging needed.
ID: 52917 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1134
Credit: 11,579,777
RAC: 14,842
Message 52918 - Posted: 28 Jan 2026, 18:56:20 UTC - in response to Message 52909.  

In reply to Garrulus glandarius's message of 28 Jan 2026:
Got a bunch that failed again the past couple of hours, switched over to ATLAS for the time being

Actually, it seems to be in perfect correlation with the number of jobs constantly dropping since 10:24 UTC

Yes, that does seem to be the case.
ID: 52918 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1134
Credit: 11,579,777
RAC: 14,842
Message 52919 - Posted: 28 Jan 2026, 19:14:09 UTC - in response to Message 52915.  

In reply to Erich56's message of 28 Jan 2026:
I just noticed that since about noontime of today most of the tasks failed after various periods of time (between a few minutes and half an hour).
In many cases, stderr says that filename or extension is too long:

Der Dateiname oder die Erweiterung ist zu lang.
(0xce) - exit code 206 (0xce)

what's going on?

Some of your failures are network related; could be a problem at cern:

2026-01-28 15:27:42 (24776): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat )
2026-01-28 15:27:42 (24776): Guest Log: Ncat: Connection to 137.138.53.124 failed: Connection timed out.
2026-01-28 15:27:42 (24776): Guest Log: Ncat: Trying next address...
2026-01-28 15:27:42 (24776): Guest Log: Ncat: Network is unreachable.

[TinyPC:words] > nslookup 137.138.53.124
Server: UnKnown
Address: 10.174.98.157

Name: vocms0204.cern.ch
Address: 137.138.53.124

ID: 52919 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1261
Credit: 92,106,969
RAC: 109,679
Message 52923 - Posted: 28 Jan 2026, 20:15:51 UTC

Same here
Volunteer Mad Scientist For Life

unbelievable are you trying to promote linux again?
ID: 52923 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1134
Credit: 11,579,777
RAC: 14,842
Message 52925 - Posted: 28 Jan 2026, 21:11:12 UTC - in response to Message 52923.  

It looks like the dip might be bottoming out. Digits cruciate!
ID: 52925 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1261
Credit: 92,106,969
RAC: 109,679
Message 52926 - Posted: 28 Jan 2026, 21:34:56 UTC - in response to Message 52925.  

In reply to ivan's message of 28 Jan 2026:
It looks like the dip might be bottoming out. Digits cruciate!

I hope so Ivan.......fingers crossed
ID: 52926 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1134
Credit: 11,579,777
RAC: 14,842
Message 52927 - Posted: 28 Jan 2026, 22:15:42 UTC - in response to Message 52926.  

In reply to Magic Quantum Mechanic's message of 28 Jan 2026:
In reply to ivan's message of 28 Jan 2026:
It looks like the dip might be bottoming out. Digits cruciate!

I hope so Ivan.......fingers crossed
I wish I'd said that!
ID: 52927 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1261
Credit: 92,106,969
RAC: 109,679
Message 52928 - Posted: 28 Jan 2026, 22:57:56 UTC - in response to Message 52927.  

I figured 2 fingers crossed might work so I tried a new CMS and since the ones crashing would do it in 25 minutes or less it looks like this one is running and just passed the first hour so if this ends up Valid I will start up more on here and if they keep running reload the other hosts.

I did have to get in the VB Manager and remove all those crashed ones
ID: 52928 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1134
Credit: 11,579,777
RAC: 14,842
Message 52929 - Posted: 28 Jan 2026, 23:06:24 UTC - in response to Message 52928.  

We do have a fairly good rebound, but it might be stalling. Federica has called us in for a meeting tomorrow; let's see if we can come up with some ideas by then. I' m preoccupied by having just four weeks to reclaim my Uni and CERN computing credentials, tho'but...
ID: 52929 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1261
Credit: 92,106,969
RAC: 109,679
Message 52930 - Posted: 28 Jan 2026, 23:17:44 UTC - in response to Message 52929.  


Sounds good and have a fine day.......it is a holiday here on the 29th
ID: 52930 · Report as offensive     Reply Quote
Garrulus glandarius

Send message
Joined: 5 Apr 25
Posts: 66
Credit: 1,784,447
RAC: 13,818
Message 52932 - Posted: 29 Jan 2026, 11:12:12 UTC
Last modified: 29 Jan 2026, 12:01:40 UTC

Things seem to be going smoothly now. My main LHC rig is still busy with ATLAS, but an old i7 has been running a new-ish CMS task for over 13 hours with apparently normal progress.
ID: 52932 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1941
Credit: 156,018,218
RAC: 107,484
Message 52934 - Posted: 29 Jan 2026, 13:42:55 UTC - in response to Message 52932.  

same here (with the exception of only 1 task this late morning) :-)
ID: 52934 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Message boards : CMS Application : CMS tasks failing


©2026 CERN