Message boards : CMS Application : CMS tasks failing
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 53012 - Posted: 10 Feb 2026, 9:21:31 UTC - in response to Message 53011.  

From the jobs graphs, things came good again about 2000 UTC last night.
ID: 53012 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 918
Credit: 779,284,737
RAC: 156,677
Message 53061 - Posted: 22 Feb 2026, 20:16:38 UTC

[ERROR] Could not connect to vocms0830.cern.ch on port 9618

seems like there is some issues with the servers.
ID: 53061 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1291
Credit: 95,273,789
RAC: 34,536
Message 53062 - Posted: 22 Feb 2026, 21:31:41 UTC - in response to Message 53061.  
Last modified: 22 Feb 2026, 21:47:21 UTC

The servers do that here as soon as I am asleep
Over 100 of these https://lhcathome.cern.ch/lhcathome/result.php?resultid=432978091 (-dev too of course)
ID: 53062 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 53063 - Posted: 23 Feb 2026, 0:04:41 UTC - in response to Message 53061.  
Last modified: 23 Feb 2026, 0:05:03 UTC

In reply to Toby Broom's message of 22 Feb 2026:
[ERROR] Could not connect to vocms0830.cern.ch on port 9618

seems like there is some issues with the servers.

Yes, there are jobs in the queues, but I'm getting that error on all my tasks at present.
ID: 53063 · Report as offensive     Reply Quote
alf

Send message
Joined: 5 Apr 20
Posts: 4
Credit: 58,637,869
RAC: 12,593
Message 53064 - Posted: 23 Feb 2026, 0:48:46 UTC

All CMS tasks failing within 4 minutes


<core_client_version>8.2.4</core_client_version>
<![CDATA[
<message>
The filename or extension is too long.
(0xce) - exit code 206 (0xce)</message>
<stderr_txt>
ID: 53064 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1967
Credit: 159,323,562
RAC: 45,989
Message 53065 - Posted: 23 Feb 2026, 4:26:27 UTC - in response to Message 53064.  

In reply to alf's message of 23 Feb 2026:
All CMS tasks failing within 4 minutes


<core_client_version>8.2.4</core_client_version>
<![CDATA[
<message>
The filename or extension is too long.
(0xce) - exit code 206 (0xce)</message>
<stderr_txt>
the old, frequently happening and well known error :-(
ID: 53065 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1967
Credit: 159,323,562
RAC: 45,989
Message 53067 - Posted: 23 Feb 2026, 6:34:28 UTC - in response to Message 53065.  

A closer look at the stderrs of the failed tasks reveales that there are numerous connection timeouts to all kinds of targets.
And at the end, it says "VM Completion Message: Could not connect to all required network services"
So, there seems to exist a major network problem at CERN
ID: 53067 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 53070 - Posted: 23 Feb 2026, 12:19:01 UTC - in response to Message 53067.  

In reply to Erich56's message of 23 Feb 2026:
A closer look at the stderrs of the failed tasks reveales that there are numerous connection timeouts to all kinds of targets.
And at the end, it says "VM Completion Message: Could not connect to all required network services"
So, there seems to exist a major network problem at CERN

Yes, there is a mishmash of error messages, but the one of real importance seems to be the failure to connect to vocms0830. I can recreate the error with
nc -z -vvv vocms0830.cern.ch 9618
. However, this works inside CERN (from lxplus), so there does appear to be a network component to the problem. I've put a request out to the internal Mattermost discussion boards, hopefully to the appropriate channel!
ID: 53070 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 53071 - Posted: 23 Feb 2026, 13:32:19 UTC - in response to Message 53070.  

In reply to ivan's message of 23 Feb 2026:

Yes, there is a mishmash of error messages, but the one of real importance seems to be the failure to connect to vocms0830. I can recreate the error with
nc -z -vvv vocms0830.cern.ch 9618
. However, this works inside CERN (from lxplus), so there does appear to be a network component to the problem. I've put a request out to the internal Mattermost discussion boards, hopefully to the appropriate channel!
Interesting...
Is this related to puppet not running on vocms0830 and firewall exceptions being removed?

ID: 53071 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 53072 - Posted: 23 Feb 2026, 15:51:03 UTC - in response to Message 53071.  
Last modified: 23 Feb 2026, 15:51:26 UTC

Interesting...
Is this related to puppet not running on vocms0830 and firewall exceptions being removed?
Yes, this is related. Puppet is not running, because the host certificate puppet uses is expired. If puppet is not running for a couple of days, the host will be removed from landbsets, hence removing all firewall rules. Usually, the certificate on the host should have been renewed automatically. I am currently checking, why it didn't do that. Also we should have gotten an (probably multiple) emails on that.

ID: 53072 · Report as offensive     Reply Quote
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1291
Credit: 95,273,789
RAC: 34,536
Message 53076 - Posted: 23 Feb 2026, 23:18:33 UTC

just ran a single test an still not working
ID: 53076 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 53077 - Posted: 23 Feb 2026, 23:48:22 UTC - in response to Message 53076.  

No, I'm waiting to hear back how they plan to fix it -- manually reinstate the certificate and firewall rules, or wait until they've worked out why the autorenew didn't work.
ID: 53077 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1967
Credit: 159,323,562
RAC: 45,989
Message 53079 - Posted: 24 Feb 2026, 6:51:00 UTC - in response to Message 53077.  

In reply to ivan's message of 23 Feb 2026:
No, I'm waiting to hear back how they plan to fix it -- manually reinstate the certificate and firewall rules, or wait until they've worked out why the autorenew didn't work.
whatever will happen - I'm afraid it will take a while until CMS is working again :-(
ID: 53079 · Report as offensive     Reply Quote
Emmanuel Mar
Avatar

Send message
Joined: 9 Feb 09
Posts: 52
Credit: 10,810,515
RAC: 19,835
Message 53081 - Posted: 24 Feb 2026, 8:50:26 UTC
Last modified: 24 Feb 2026, 8:51:44 UTC

Good morning everyone, after calculating the tasks, I restarted the project and I can confirm that the CMS tasks are still failing.

For now, I've removed it from the options for requesting tasks.

February 24, 2026

Best regards
ID: 53081 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 53082 - Posted: 24 Feb 2026, 11:39:58 UTC - in response to Message 53079.  

In reply to Erich56's message of 24 Feb 2026:
In reply to ivan's message of 23 Feb 2026:
No, I'm waiting to hear back how they plan to fix it -- manually reinstate the certificate and firewall rules, or wait until they've worked out why the autorenew didn't work.
whatever will happen - I'm afraid it will take a while until CMS is working again :-(

Unfortunately, you may be right. There has been some dififculty in reinstating the collector's certificate.
ID: 53082 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1152
Credit: 11,734,920
RAC: 657
Message 53083 - Posted: 24 Feb 2026, 15:26:36 UTC - in response to Message 53082.  

We've now been told that the cleanest solution is to rebuild the host from scratch, so that is being worked on.
ID: 53083 · Report as offensive     Reply Quote
[VENETO] boboviz
Avatar

Send message
Joined: 7 May 08
Posts: 273
Credit: 2,131,245
RAC: 252
Message 53086 - Posted: 25 Feb 2026, 9:14:33 UTC

That's a strange error:
The filename wildcard characters * or ? were entered incorrectly or too many were specified.
(0xd0) - exit code 208 (0xd0)</message>
ID: 53086 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1533
Credit: 10,042,485
RAC: 1,277
Message 53087 - Posted: 25 Feb 2026, 9:31:38 UTC - in response to Message 53086.  

In reply to [VENETO] boboviz's message of 25 Feb 2026:
That's a strange error:
The filename wildcard characters * or ? were entered incorrectly or too many were specified.
(0xd0) - exit code 208 (0xd0)</message>
That's a wrong interpretation of the error code by Windows like your errors:

Nome del file o estensione troppo lunga.
(0xce) - exit code 206 (0xce)</message>

and

I caratteri jolly per i nomi di file, * o ?, sono stati immessi incorrettamente oppure ne sono stati specificati troppi.
(0xd0) - exit code 208 (0xd0)</message>
ID: 53087 · Report as offensive     Reply Quote
[VENETO] boboviz
Avatar

Send message
Joined: 7 May 08
Posts: 273
Credit: 2,131,245
RAC: 252
Message 53088 - Posted: 25 Feb 2026, 11:07:12 UTC - in response to Message 53087.  

In reply to Crystal Pellet's message of 25 Feb 2026:

I caratteri jolly per i nomi di file, * o ?, sono stati immessi incorrettamente oppure ne sono stati specificati troppi.
(0xd0) - exit code 208 (0xd0)</message>


So i have to wait a new cms app version?
ID: 53088 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1967
Credit: 159,323,562
RAC: 45,989
Message 53095 - Posted: 26 Feb 2026, 10:50:20 UTC - in response to Message 53083.  

In reply to ivan's message of 24 Feb 2026:
We've now been told that the cleanest solution is to rebuild the host from scratch, so that is being worked on.
Ivan, any idea how long this might take? a couple of days? 1 week or 2? or even longer?
ID: 53095 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next

Message boards : CMS Application : CMS tasks failing


©2026 CERN