Message boards : CMS Application : CMS tasks failing
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 52861 - Posted: 20 Jan 2026, 15:45:16 UTC - in response to Message 52860.  

Counting up the job failures in the current workflow, the 5 users with highest counts (anonymised due to GDPR) are:
      3 614757
     13 xxxxxx
     56 yyyyy
    109 zzzzzz
    426 nnnnnn

ID: 52861 · Report as offensive     Reply Quote
Garrulus glandarius

Send message
Joined: 5 Apr 25
Posts: 66
Credit: 2,214,382
RAC: 4,825
Message 52909 - Posted: 28 Jan 2026, 12:56:12 UTC
Last modified: 28 Jan 2026, 13:16:07 UTC

Got a bunch that failed again the past couple of hours, switched over to ATLAS for the time being

Actually, it seems to be in perfect correlation with the number of jobs constantly dropping since 10:24 UTC
ID: 52909 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1964
Credit: 159,289,840
RAC: 46,178
Message 52915 - Posted: 28 Jan 2026, 16:34:57 UTC

I just noticed that since about noontime of today most of the tasks failed after various periods of time (between a few minutes and half an hour).
In many cases, stderr says that filename or extension is too long:

Der Dateiname oder die Erweiterung ist zu lang.
(0xce) - exit code 206 (0xce)</message>

what's going on?
ID: 52915 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 52917 - Posted: 28 Jan 2026, 18:54:00 UTC - in response to Message 52915.  

In reply to Erich56's message of 28 Jan 2026:
I just noticed that since about noontime of today most of the tasks failed after various periods of time (between a few minutes and half an hour).
In many cases, stderr says that filename or extension is too long:

Der Dateiname oder die Erweiterung ist zu lang.
(0xce) - exit code 206 (0xce)

what's going on?

Yes, I've noticed the jobs falling over the last hours. Not sure what's going on yet, that's not a common error message I'm tempted to believe it's coming from the operating system, but more digging needed.
ID: 52917 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 52918 - Posted: 28 Jan 2026, 18:56:20 UTC - in response to Message 52909.  

In reply to Garrulus glandarius's message of 28 Jan 2026:
Got a bunch that failed again the past couple of hours, switched over to ATLAS for the time being

Actually, it seems to be in perfect correlation with the number of jobs constantly dropping since 10:24 UTC

Yes, that does seem to be the case.
ID: 52918 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 52919 - Posted: 28 Jan 2026, 19:14:09 UTC - in response to Message 52915.  

In reply to Erich56's message of 28 Jan 2026:
I just noticed that since about noontime of today most of the tasks failed after various periods of time (between a few minutes and half an hour).
In many cases, stderr says that filename or extension is too long:

Der Dateiname oder die Erweiterung ist zu lang.
(0xce) - exit code 206 (0xce)

what's going on?

Some of your failures are network related; could be a problem at cern:

2026-01-28 15:27:42 (24776): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat )
2026-01-28 15:27:42 (24776): Guest Log: Ncat: Connection to 137.138.53.124 failed: Connection timed out.
2026-01-28 15:27:42 (24776): Guest Log: Ncat: Trying next address...
2026-01-28 15:27:42 (24776): Guest Log: Ncat: Network is unreachable.

[TinyPC:words] > nslookup 137.138.53.124
Server: UnKnown
Address: 10.174.98.157

Name: vocms0204.cern.ch
Address: 137.138.53.124

ID: 52919 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1291
Credit: 95,259,444
RAC: 34,938
Message 52923 - Posted: 28 Jan 2026, 20:15:51 UTC

Same here
Volunteer Mad Scientist For Life

unbelievable are you trying to promote linux again?
ID: 52923 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 52925 - Posted: 28 Jan 2026, 21:11:12 UTC - in response to Message 52923.  

It looks like the dip might be bottoming out. Digits cruciate!
ID: 52925 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1291
Credit: 95,259,444
RAC: 34,938
Message 52926 - Posted: 28 Jan 2026, 21:34:56 UTC - in response to Message 52925.  

In reply to ivan's message of 28 Jan 2026:
It looks like the dip might be bottoming out. Digits cruciate!

I hope so Ivan.......fingers crossed
ID: 52926 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 52927 - Posted: 28 Jan 2026, 22:15:42 UTC - in response to Message 52926.  

In reply to Magic Quantum Mechanic's message of 28 Jan 2026:
In reply to ivan's message of 28 Jan 2026:
It looks like the dip might be bottoming out. Digits cruciate!

I hope so Ivan.......fingers crossed
I wish I'd said that!
ID: 52927 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1291
Credit: 95,259,444
RAC: 34,938
Message 52928 - Posted: 28 Jan 2026, 22:57:56 UTC - in response to Message 52927.  

I figured 2 fingers crossed might work so I tried a new CMS and since the ones crashing would do it in 25 minutes or less it looks like this one is running and just passed the first hour so if this ends up Valid I will start up more on here and if they keep running reload the other hosts.

I did have to get in the VB Manager and remove all those crashed ones
ID: 52928 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 52929 - Posted: 28 Jan 2026, 23:06:24 UTC - in response to Message 52928.  

We do have a fairly good rebound, but it might be stalling. Federica has called us in for a meeting tomorrow; let's see if we can come up with some ideas by then. I' m preoccupied by having just four weeks to reclaim my Uni and CERN computing credentials, tho'but...
ID: 52929 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1291
Credit: 95,259,444
RAC: 34,938
Message 52930 - Posted: 28 Jan 2026, 23:17:44 UTC - in response to Message 52929.  


Sounds good and have a fine day.......it is a holiday here on the 29th
ID: 52930 · Report as offensive     Reply Quote
Garrulus glandarius

Send message
Joined: 5 Apr 25
Posts: 66
Credit: 2,214,382
RAC: 4,825
Message 52932 - Posted: 29 Jan 2026, 11:12:12 UTC
Last modified: 29 Jan 2026, 12:01:40 UTC

Things seem to be going smoothly now. My main LHC rig is still busy with ATLAS, but an old i7 has been running a new-ish CMS task for over 13 hours with apparently normal progress.
ID: 52932 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1964
Credit: 159,289,840
RAC: 46,178
Message 52934 - Posted: 29 Jan 2026, 13:42:55 UTC - in response to Message 52932.  

same here (with the exception of only 1 task this late morning) :-)
ID: 52934 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1291
Credit: 95,259,444
RAC: 34,938
Message 53005 - Posted: 9 Feb 2026, 12:48:19 UTC

Getting these again (26 so far) possibly Windows again......not going to look since we have all of these hidden pcs for no reason

<message>
The global filename characters, * or ?, are entered incorrectly or too many global filename characters are specified.
(0xd0) - exit code 208 (0xd0)</message>

As usual it happens when I usually am asleep but just happened to check at 4am and found that here and at -dev and suspended all of mine again
https://lhcathome.cern.ch/lhcathome/results.php?userid=5472
goodnight
ID: 53005 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1533
Credit: 10,042,485
RAC: 1,277
Message 53006 - Posted: 9 Feb 2026, 15:46:51 UTC - in response to Message 53005.  
Last modified: 9 Feb 2026, 15:52:23 UTC

That message with the exit code is a misinterpretation by Windows.
The problem seems to be the 'usual' lack of jobs from the backbone.
The 4 processes glidein_startup are waiting for a job and give up after a while.

Exit status 208 (0x000000D0) EXIT_SUB_TASK_FAILURE

2026-02-09 16:45:30 (7752): Guest Log: [ERROR] glidein exited with return value 1.
2026-02-09 16:45:30 (7752): Guest Log: [DEBUG] Volunteer: Crystal Pellet (180436)
2026-02-09 16:45:30 (7752): Guest Log: [INFO] Shutting Down.
ID: 53006 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 53007 - Posted: 9 Feb 2026, 16:23:10 UTC - in response to Message 53006.  

We seem to be having some network problems again -- I see failures to connect to CERN, and the number of running jobs is falling.
ID: 53007 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1291
Credit: 95,259,444
RAC: 34,938
Message 53010 - Posted: 9 Feb 2026, 23:59:47 UTC

Well am up and testing again and I am trying one here (12 hours later) and one at -dev and they seem to be doing work this time and if they make it to Valid I will fire up some more. (they usually fail in 25mins) and the two I have running just passed 1 hour
ID: 53010 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1291
Credit: 95,259,444
RAC: 34,938
Message 53011 - Posted: 10 Feb 2026, 1:56:05 UTC - in response to Message 53010.  


Back in business unless this is a jinx
https://lhcathome.cern.ch/lhcathome/result.php?resultid=432531314

Have several more that look good running too and looks good at -dev also https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3699829
ID: 53011 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next

Message boards : CMS Application : CMS tasks failing


©2026 CERN