Message boards : CMS Application : CMS tasks failing
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next
| Author | Message |
|---|---|
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
|
|
Send message Joined: 5 Apr 25 Posts: 66 Credit: 2,214,382 RAC: 4,825 |
Got a bunch that failed again the past couple of hours, switched over to ATLAS for the time being Actually, it seems to be in perfect correlation with the number of jobs constantly dropping since 10:24 UTC
|
|
Send message Joined: 18 Dec 15 Posts: 1967 Credit: 159,323,130 RAC: 46,350 |
I just noticed that since about noontime of today most of the tasks failed after various periods of time (between a few minutes and half an hour). In many cases, stderr says that filename or extension is too long: Der Dateiname oder die Erweiterung ist zu lang. (0xce) - exit code 206 (0xce)</message> what's going on? |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
In reply to Erich56's message of 28 Jan 2026: I just noticed that since about noontime of today most of the tasks failed after various periods of time (between a few minutes and half an hour). Yes, I've noticed the jobs falling over the last hours. Not sure what's going on yet, that's not a common error message I'm tempted to believe it's coming from the operating system, but more digging needed. |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
In reply to Garrulus glandarius's message of 28 Jan 2026: Got a bunch that failed again the past couple of hours, switched over to ATLAS for the time being Yes, that does seem to be the case. |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
In reply to Erich56's message of 28 Jan 2026: I just noticed that since about noontime of today most of the tasks failed after various periods of time (between a few minutes and half an hour). Some of your failures are network related; could be a problem at cern: 2026-01-28 15:27:42 (24776): Guest Log: Ncat: Version 7.50 ( https://nmap.org/ncat ) 2026-01-28 15:27:42 (24776): Guest Log: Ncat: Connection to 137.138.53.124 failed: Connection timed out. 2026-01-28 15:27:42 (24776): Guest Log: Ncat: Trying next address... 2026-01-28 15:27:42 (24776): Guest Log: Ncat: Network is unreachable. [TinyPC:words] > nslookup 137.138.53.124 Server: UnKnown Address: 10.174.98.157 Name: vocms0204.cern.ch Address: 137.138.53.124 |
Magic Quantum MechanicSend message Joined: 24 Oct 04 Posts: 1291 Credit: 95,271,120 RAC: 34,514 |
Same here Volunteer Mad Scientist For Life unbelievable are you trying to promote linux again? |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
|
Magic Quantum MechanicSend message Joined: 24 Oct 04 Posts: 1291 Credit: 95,271,120 RAC: 34,514 |
In reply to ivan's message of 28 Jan 2026: It looks like the dip might be bottoming out. Digits cruciate! I hope so Ivan.......fingers crossed |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
|
Magic Quantum MechanicSend message Joined: 24 Oct 04 Posts: 1291 Credit: 95,271,120 RAC: 34,514 |
I figured 2 fingers crossed might work so I tried a new CMS and since the ones crashing would do it in 25 minutes or less it looks like this one is running and just passed the first hour so if this ends up Valid I will start up more on here and if they keep running reload the other hosts. I did have to get in the VB Manager and remove all those crashed ones |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
|
Magic Quantum MechanicSend message Joined: 24 Oct 04 Posts: 1291 Credit: 95,271,120 RAC: 34,514 |
Sounds good and have a fine day.......it is a holiday here on the 29th |
|
Send message Joined: 5 Apr 25 Posts: 66 Credit: 2,214,382 RAC: 4,825 |
Things seem to be going smoothly now. My main LHC rig is still busy with ATLAS, but an old i7 has been running a new-ish CMS task for over 13 hours with apparently normal progress.
|
|
Send message Joined: 18 Dec 15 Posts: 1967 Credit: 159,323,130 RAC: 46,350 |
same here (with the exception of only 1 task this late morning) :-) |
Magic Quantum MechanicSend message Joined: 24 Oct 04 Posts: 1291 Credit: 95,271,120 RAC: 34,514 |
Getting these again (26 so far) possibly Windows again......not going to look since we have all of these hidden pcs for no reason <message> The global filename characters, * or ?, are entered incorrectly or too many global filename characters are specified. (0xd0) - exit code 208 (0xd0)</message> As usual it happens when I usually am asleep but just happened to check at 4am and found that here and at -dev and suspended all of mine again https://lhcathome.cern.ch/lhcathome/results.php?userid=5472 goodnight |
|
Send message Joined: 14 Jan 10 Posts: 1533 Credit: 10,042,485 RAC: 1,277 |
That message with the exit code is a misinterpretation by Windows. The problem seems to be the 'usual' lack of jobs from the backbone. The 4 processes glidein_startup are waiting for a job and give up after a while. Exit status 208 (0x000000D0) EXIT_SUB_TASK_FAILURE 2026-02-09 16:45:30 (7752): Guest Log: [ERROR] glidein exited with return value 1. 2026-02-09 16:45:30 (7752): Guest Log: [DEBUG] Volunteer: Crystal Pellet (180436) 2026-02-09 16:45:30 (7752): Guest Log: [INFO] Shutting Down. |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
|
Magic Quantum MechanicSend message Joined: 24 Oct 04 Posts: 1291 Credit: 95,271,120 RAC: 34,514 |
Well am up and testing again and I am trying one here (12 hours later) and one at -dev and they seem to be doing work this time and if they make it to Valid I will fire up some more. (they usually fail in 25mins) and the two I have running just passed 1 hour |
Magic Quantum MechanicSend message Joined: 24 Oct 04 Posts: 1291 Credit: 95,271,120 RAC: 34,514 |
Back in business unless this is a jinx https://lhcathome.cern.ch/lhcathome/result.php?resultid=432531314 Have several more that look good running too and looks good at -dev also https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3699829 |
©2026 CERN