Message boards : CMS Application : CMS tasks failing
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next
| Author | Message |
|---|---|
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
|
|
Send message Joined: 27 Sep 08 Posts: 918 Credit: 779,138,205 RAC: 152,761 |
[ERROR] Could not connect to vocms0830.cern.ch on port 9618 seems like there is some issues with the servers. |
Magic Quantum MechanicSend message Joined: 24 Oct 04 Posts: 1291 Credit: 95,259,444 RAC: 34,938 |
The servers do that here as soon as I am asleep Over 100 of these https://lhcathome.cern.ch/lhcathome/result.php?resultid=432978091 (-dev too of course) |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
|
|
Send message Joined: 5 Apr 20 Posts: 4 Credit: 58,637,869 RAC: 12,593 |
All CMS tasks failing within 4 minutes <core_client_version>8.2.4</core_client_version> <![CDATA[ <message> The filename or extension is too long. (0xce) - exit code 206 (0xce)</message> <stderr_txt> |
|
Send message Joined: 18 Dec 15 Posts: 1966 Credit: 159,289,840 RAC: 46,178 |
In reply to alf's message of 23 Feb 2026: All CMS tasks failing within 4 minutesthe old, frequently happening and well known error :-( |
|
Send message Joined: 18 Dec 15 Posts: 1966 Credit: 159,289,840 RAC: 46,178 |
A closer look at the stderrs of the failed tasks reveales that there are numerous connection timeouts to all kinds of targets. And at the end, it says "VM Completion Message: Could not connect to all required network services" So, there seems to exist a major network problem at CERN |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
In reply to Erich56's message of 23 Feb 2026: A closer look at the stderrs of the failed tasks reveales that there are numerous connection timeouts to all kinds of targets. Yes, there is a mishmash of error messages, but the one of real importance seems to be the failure to connect to vocms0830. I can recreate the error with nc -z -vvv vocms0830.cern.ch 9618. However, this works inside CERN (from lxplus), so there does appear to be a network component to the problem. I've put a request out to the internal Mattermost discussion boards, hopefully to the appropriate channel! |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
In reply to ivan's message of 23 Feb 2026:Interesting... Is this related to puppet not running on vocms0830 and firewall exceptions being removed? |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
Interesting...Is this related to puppet not running on vocms0830 and firewall exceptions being removed? Yes, this is related. Puppet is not running, because the host certificate puppet uses is expired. If puppet is not running for a couple of days, the host will be removed from landbsets, hence removing all firewall rules. Usually, the certificate on the host should have been renewed automatically. I am currently checking, why it didn't do that. Also we should have gotten an (probably multiple) emails on that. |
Magic Quantum MechanicSend message Joined: 24 Oct 04 Posts: 1291 Credit: 95,259,444 RAC: 34,938 |
just ran a single test an still not working |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
|
|
Send message Joined: 18 Dec 15 Posts: 1966 Credit: 159,289,840 RAC: 46,178 |
In reply to ivan's message of 23 Feb 2026: No, I'm waiting to hear back how they plan to fix it -- manually reinstate the certificate and firewall rules, or wait until they've worked out why the autorenew didn't work.whatever will happen - I'm afraid it will take a while until CMS is working again :-( |
|
Send message Joined: 9 Feb 09 Posts: 52 Credit: 10,800,231 RAC: 20,114 |
Good morning everyone, after calculating the tasks, I restarted the project and I can confirm that the CMS tasks are still failing. For now, I've removed it from the options for requesting tasks. February 24, 2026 Best regards |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
In reply to Erich56's message of 24 Feb 2026: In reply to ivan's message of 23 Feb 2026: Unfortunately, you may be right. There has been some dififculty in reinstating the collector's certificate. |
|
Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657 |
|
|
Send message Joined: 7 May 08 Posts: 273 Credit: 2,131,245 RAC: 252 |
That's a strange error: The filename wildcard characters * or ? were entered incorrectly or too many were specified. |
|
Send message Joined: 14 Jan 10 Posts: 1533 Credit: 10,042,485 RAC: 1,277 |
In reply to [VENETO] boboviz's message of 25 Feb 2026: That's a strange error:That's a wrong interpretation of the error code by Windows like your errors: Nome del file o estensione troppo lunga. (0xce) - exit code 206 (0xce)</message> and I caratteri jolly per i nomi di file, * o ?, sono stati immessi incorrettamente oppure ne sono stati specificati troppi. (0xd0) - exit code 208 (0xd0)</message> |
|
Send message Joined: 7 May 08 Posts: 273 Credit: 2,131,245 RAC: 252 |
In reply to Crystal Pellet's message of 25 Feb 2026:
So i have to wait a new cms app version? |
|
Send message Joined: 18 Dec 15 Posts: 1966 Credit: 159,289,840 RAC: 46,178 |
In reply to ivan's message of 24 Feb 2026: We've now been told that the cleanest solution is to rebuild the host from scratch, so that is being worked on.Ivan, any idea how long this might take? a couple of days? 1 week or 2? or even longer? |
©2026 CERN