Message boards : CMS Application : CMS tasks failing
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
| Author | Message |
|---|---|
|
Send message Joined: 27 Apr 24 Posts: 22 Credit: 1,594,872 RAC: 3,712 |
you should be aware though that they are of no use for the science. I know that. It's their problem, not mine. |
|
Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 86,570 |
CMS seem to work on some hosts but mine get this error: 2025-11-02 08:12:45 (1029094): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2025-11-02 08:12:46 (1029094): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2025-11-02 08:13:17 (1029094): Guest Log: [DEBUG] % Total % Received % Xferd Average Speed Time Time Time Current 2025-11-02 08:13:17 (1029094): Guest Log: Dload Upload Total Spent Left Speed 2025-11-02 08:13:17 (1029094): Guest Log: 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 2025-11-02 08:13:17 (1029094): Guest Log: 100 54 0 54 0 0 70 0 --:--:-- --:--:-- --:--:-- 70 2025-11-02 08:13:17 (1029094): Guest Log: 100 54 0 54 0 0 66 0 --:--:-- --:--:-- --:--:-- 66 2025-11-02 08:13:17 (1029094): Guest Log: [DEBUG] 2025-11-02 08:13:17 (1029094): Guest Log: ERROR: Couldn't read proxy from: /tmp/x509up_u0 2025-11-02 08:13:17 (1029094): Guest Log: globus_credential: Error reading proxy credential 2025-11-02 08:13:17 (1029094): Guest Log: globus_credential: Error reading proxy credential: Couldn't read PEM from bio 2025-11-02 08:13:17 (1029094): Guest Log: OpenSSL Error: pem_lib.c:707: in library: PEM routines, function PEM_read_bio: no start line 2025-11-02 08:13:17 (1029094): Guest Log: Use -debug for further information. 2025-11-02 08:13:17 (1029094): Guest Log: [ERROR] Could not get an x509 credential 2025-11-02 08:13:17 (1029094): Guest Log: [ERROR] The x509 proxy creation failed. |
|
Send message Joined: 18 Dec 15 Posts: 1941 Credit: 156,014,199 RAC: 107,802 |
CMS seem to work on some hosts but mine get this error:same problem here :-( |
|
Send message Joined: 14 Jan 10 Posts: 1491 Credit: 9,985,155 RAC: 973 |
On the development system CMS is running OK:CMS seem to work on some hosts but mine get this error:same problem here :-( 00:01:21.379155 VMMDev: Guest Log: [INFO] Reading volunteer information 00:01:26.181010 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home 00:01:26.975471 VMMDev: Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 00:01:28.156063 VMMDev: Guest Log: [INFO] Requesting an idtoken from LHC@home 00:01:28.755910 VMMDev: Guest Log: [INFO] Requesting an idtoken from vLHC@home-dev 00:01:29.491935 VMMDev: Guest Log: [INFO] CMS application starting. Check log files. |
|
Send message Joined: 13 May 20 Posts: 52 Credit: 2,918,679 RAC: 4,008 |
bonsoir, je ne sais pas si cela pourra aider mais il y a quelques jours ,toutes mes taches Cms partaient en erreur.J'ai réduit la limite d'unités téléchargées a 8 au lieu de"pas de limite". J'ai demandé a Claude ai qui m'as fait faire une manipulation dans le terminal-sudo usermod -aG vboxusers $USER- et depuis tout fonctionne correctement. Je suis sous linux mint 22.2 et virtualbox 7.24 et j'ai retiré kvm intel du noyau dans -sudo nano /etc/modprobe.d/blacklist-kvm.conf blacklist kvm_intel blacklist kvm_amd blacklist kvm" good evening, I don’t know if it will help but a few days ago, all my tasks Cms was leaving in error. I have reduced the limit of downloaded units to 8 instead of 'no limit'. I asked Claude AI who made me do a manipulation in the terminal-sudo usermod -aG vboxusers $USER- and since then everything works correctly. I am on linux mint 22.2 and virtualbox 7.24 and I removed kvm intel from the kernel in -sudo nano /etc/modprobe. d/blacklist-kvm.conf kvm_intel blacklist blacklist kvm_amd blacklist kvm" |
|
Send message Joined: 29 Aug 05 Posts: 1134 Credit: 11,579,777 RAC: 14,842 |
Hello everyone. Thanks for your patience these last weeks. We were finally able to get our "supply chain" sorted out last Friday and get jobs flowing again. I refrained from posting a celebration, because Hallowe'en... Sure enough we had a little hiccup that I was able to get fixed on Saturday, but things went bad again this morning with failures in getting certificate proxies -- funnily enough it didn't affect my running machine, I must have got my last task just before the problem arose. The failure was in a CA server run by CERN IT, who were able to fix it soon after we raised a ticket. So, fingers crossed, we are now back in action again and you can resume getting new tasks if you have been holding off. |
|
Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 86,570 |
Hint for volunteers using a local firewall: CMS now requires TCP port 9620 to be open for outgoing connections to HTCondor CCB. |
|
Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 86,570 |
]Hint for volunteers using a local firewall: CMS now requires TCP port 9620 to be open for outgoing connections to HTCondor CCB.[/quote] This afternoon a test for port 9620 has been added to the CMS bootstrap script. Tasks passing the test report something like this to stderr.txt: [pre]2025-11-04 10:29:28 (113263): Guest Log: [INFO] Testing connection to HTCondor-Collector 2025-11-04 10:29:29 (113263): Guest Log: [INFO] Testing connection to HTCondor-CCB[/pre] |
|
Send message Joined: 24 Jan 06 Posts: 8 Credit: 9,612,140 RAC: 4,042 |
Looks like the x509 errors are back? This is on a Windows VM. 2025-11-11 04:43:39 (6964): Guest Log: [INFO] Testing connection to http://cms-frontier.openhtc.io:8080/FrontierProd/Frontier/ Getting quite a few of them again over the last few days. FWIW: The connection check to HTCondor CCB is passing. Theory tasks seem to be working fine and even had one CMS task get to completion, amongst the failures. |
|
Send message Joined: 29 Aug 05 Posts: 1134 Credit: 11,579,777 RAC: 14,842 |
|
|
Send message Joined: 29 Aug 05 Posts: 1134 Credit: 11,579,777 RAC: 14,842 |
We've had a lot of jobs fail this morning. From the logs it seems to be a network problem. I suspect it's the Cloudflare outage. |
|
Send message Joined: 29 Aug 05 Posts: 1134 Credit: 11,579,777 RAC: 14,842 |
We've had a lot of jobs fail this morning. From the logs it seems to be a network problem. We seem to be recovering now, according to the Running Jobs graph. |
|
Send message Joined: 20 Sep 08 Posts: 1 Credit: 7,611,378 RAC: 9,841 |
431067740 Name: CMS_1226716_1766045429.113427_0 Exit status 93 (0x0000005D) Unknown error code Stderr output <core_client_version>8.2.8</core_client_version> |
|
Send message Joined: 18 Dec 15 Posts: 1941 Credit: 156,014,199 RAC: 107,802 |
since early evening, no jobs are available, although tasks are still being distributed, and each task fails after about half an hour. Obviously, the automatic stop function for task distribution in case of no jobs is again not working the way it's supposed to :-( |
|
Send message Joined: 29 Aug 05 Posts: 1134 Credit: 11,579,777 RAC: 14,842 |
since early evening, no jobs are available, although tasks are still being distributed, and each task fails after about half an hour. Obviously, the automatic stop function for task distribution in case of no jobs is again not working the way it's supposed to :-( Yes, I just saw that. Not an ideal time for it to happen. At first glance I can't see anything wrong from home, but I have limited network access here. I have a hospital appointment tomorrow, so I won't be in my office before about 1100 (12 hours from now). More if/when I discover anything. |
Magic Quantum MechanicSend message Joined: 24 Oct 04 Posts: 1261 Credit: 92,104,469 RAC: 110,140 |
the -dev version is also having problems again |
|
Send message Joined: 29 Aug 05 Posts: 1134 Credit: 11,579,777 RAC: 14,842 |
the -dev version is also having problems again I imagine it would. It uses the same queues as the main project. It must be some sort of authorisation problem again. BOINC sees that jobs are available, so it creates new tasks; hosts obtain a task and contact the condor server to say they are available; server allocates a job to the task, but the job fails to start; task times out; new VM starts and the cycle repeats. I'm not sure that anyone who can look into this is still active at CERN, given the proximity to Christmas. I've sent alerts to those who could investigate the condor server logs, but no response yet. |
|
Send message Joined: 18 Dec 15 Posts: 1941 Credit: 156,014,199 RAC: 107,802 |
Ivan, many thanks for your efforts; so let's hope that CMS will run again very soon; if not, I am afraid it will take until first or even second week next year |
|
Send message Joined: 29 Aug 05 Posts: 1134 Credit: 11,579,777 RAC: 14,842 |
|
|
Send message Joined: 19 Jul 18 Posts: 6 Credit: 338,972 RAC: 8 |
Dear all, do you have the error file from your VM with the last error message? Is the issue related to "VM Heartbeat file specified, but missing." for all of you? Thanks, cheers Federica |
©2026 CERN