Thread 'CMS tasks failing'

Author	Message
M0CZY Send message Joined: 27 Apr 24 Posts: 22 Credit: 1,620,932 RAC: 150	Message 52597 - Posted: 26 Oct 2025, 14:11:02 UTC - in response to Message 52588. you should be aware though that they are of no use for the science. I know that. It's their problem, not mine. ID: 52597 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 300,117,218 RAC: 48,307	Message 52611 - Posted: 2 Nov 2025, 7:23:01 UTC CMS seem to work on some hosts but mine get this error: 2025-11-02 08:12:45 (1029094): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2025-11-02 08:12:46 (1029094): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2025-11-02 08:13:17 (1029094): Guest Log: [DEBUG] % Total % Received % Xferd Average Speed Time Time Time Current 2025-11-02 08:13:17 (1029094): Guest Log: Dload Upload Total Spent Left Speed 2025-11-02 08:13:17 (1029094): Guest Log: 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 2025-11-02 08:13:17 (1029094): Guest Log: 100 54 0 54 0 0 70 0 --:--:-- --:--:-- --:--:-- 70 2025-11-02 08:13:17 (1029094): Guest Log: 100 54 0 54 0 0 66 0 --:--:-- --:--:-- --:--:-- 66 2025-11-02 08:13:17 (1029094): Guest Log: [DEBUG] 2025-11-02 08:13:17 (1029094): Guest Log: ERROR: Couldn't read proxy from: /tmp/x509up_u0 2025-11-02 08:13:17 (1029094): Guest Log: globus_credential: Error reading proxy credential 2025-11-02 08:13:17 (1029094): Guest Log: globus_credential: Error reading proxy credential: Couldn't read PEM from bio 2025-11-02 08:13:17 (1029094): Guest Log: OpenSSL Error: pem_lib.c:707: in library: PEM routines, function PEM_read_bio: no start line 2025-11-02 08:13:17 (1029094): Guest Log: Use -debug for further information. 2025-11-02 08:13:17 (1029094): Guest Log: [ERROR] Could not get an x509 credential 2025-11-02 08:13:17 (1029094): Guest Log: [ERROR] The x509 proxy creation failed. ID: 52611 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1967 Credit: 159,362,239 RAC: 45,189	Message 52612 - Posted: 2 Nov 2025, 12:04:33 UTC - in response to Message 52611. CMS seem to work on some hosts but mine get this error: ... Guest Log: [ERROR] Could not get an x509 credential Guest Log: [ERROR] The x509 proxy creation failed. same problem here :-( ID: 52612 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1533 Credit: 10,042,485 RAC: 1,074	Message 52613 - Posted: 2 Nov 2025, 16:44:48 UTC - in response to Message 52612. CMS seem to work on some hosts but mine get this error: ... Guest Log: [ERROR] Could not get an x509 credential Guest Log: [ERROR] The x509 proxy creation failed. same problem here :-( On the development system CMS is running OK: 00:01:21.379155 VMMDev: Guest Log: [INFO] Reading volunteer information 00:01:26.181010 VMMDev: Guest Log: [INFO] Requesting an X509 credential from LHC@home 00:01:26.975471 VMMDev: Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 00:01:28.156063 VMMDev: Guest Log: [INFO] Requesting an idtoken from LHC@home 00:01:28.755910 VMMDev: Guest Log: [INFO] Requesting an idtoken from vLHC@home-dev 00:01:29.491935 VMMDev: Guest Log: [INFO] CMS application starting. Check log files. ID: 52613 · Reply Quote

Pascal Send message Joined: 13 May 20 Posts: 58 Credit: 3,077,924 RAC: 1,901	Message 52614 - Posted: 2 Nov 2025, 19:27:56 UTC - in response to Message 52613. bonsoir, je ne sais pas si cela pourra aider mais il y a quelques jours ,toutes mes taches Cms partaient en erreur.J'ai réduit la limite d'unités téléchargées a 8 au lieu de"pas de limite". J'ai demandé a Claude ai qui m'as fait faire une manipulation dans le terminal-sudo usermod -aG vboxusers $USER- et depuis tout fonctionne correctement. Je suis sous linux mint 22.2 et virtualbox 7.24 et j'ai retiré kvm intel du noyau dans -sudo nano /etc/modprobe.d/blacklist-kvm.conf blacklist kvm_intel blacklist kvm_amd blacklist kvm" good evening, I don’t know if it will help but a few days ago, all my tasks Cms was leaving in error. I have reduced the limit of downloaded units to 8 instead of 'no limit'. I asked Claude AI who made me do a manipulation in the terminal-sudo usermod -aG vboxusers $USER- and since then everything works correctly. I am on linux mint 22.2 and virtualbox 7.24 and I removed kvm intel from the kernel in -sudo nano /etc/modprobe. d/blacklist-kvm.conf kvm_intel blacklist blacklist kvm_amd blacklist kvm" ID: 52614 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657	Message 52620 - Posted: 3 Nov 2025, 15:45:56 UTC Hello everyone. Thanks for your patience these last weeks. We were finally able to get our "supply chain" sorted out last Friday and get jobs flowing again. I refrained from posting a celebration, because Hallowe'en... Sure enough we had a little hiccup that I was able to get fixed on Saturday, but things went bad again this morning with failures in getting certificate proxies -- funnily enough it didn't affect my running machine, I must have got my last task just before the problem arose. The failure was in a CA server run by CERN IT, who were able to fix it soon after we raised a ticket. So, fingers crossed, we are now back in action again and you can resume getting new tasks if you have been holding off. ID: 52620 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 300,117,218 RAC: 48,307	Message 52623 - Posted: 4 Nov 2025, 7:40:10 UTC Hint for volunteers using a local firewall: CMS now requires TCP port 9620 to be open for outgoing connections to HTCondor CCB. ID: 52623 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 300,117,218 RAC: 48,307	Message 52625 - Posted: 4 Nov 2025, 13:42:32 UTC - in response to Message 52623. ]Hint for volunteers using a local firewall: CMS now requires TCP port 9620 to be open for outgoing connections to HTCondor CCB.[/quote] This afternoon a test for port 9620 has been added to the CMS bootstrap script. Tasks passing the test report something like this to stderr.txt: [pre]2025-11-04 10:29:28 (113263): Guest Log: [INFO] Testing connection to HTCondor-Collector 2025-11-04 10:29:29 (113263): Guest Log: [INFO] Testing connection to HTCondor-CCB[/pre] ID: 52625 · Reply Quote

treaclepumpkin Send message Joined: 24 Jan 06 Posts: 8 Credit: 9,929,097 RAC: 8,026	Message 52650 - Posted: 11 Nov 2025, 7:45:34 UTC Looks like the x509 errors are back? This is on a Windows VM. 2025-11-11 04:43:39 (6964): Guest Log: [INFO] Testing connection to http://cms-frontier.openhtc.io:8080/FrontierProd/Frontier/ 2025-11-11 04:43:40 (6964): Guest Log: [INFO] Got a proxy from the local BOINC client 2025-11-11 04:43:40 (6964): Guest Log: [INFO] Will use it for CVMFS and Frontier 2025-11-11 04:43:41 (6964): Guest Log: [INFO] Reloading and probing the CVMFS configuration 2025-11-11 04:43:48 (6964): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK 2025-11-11 04:43:51 (6964): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK 2025-11-11 04:43:51 (6964): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK 2025-11-11 04:43:52 (6964): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK 2025-11-11 04:44:13 (6964): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK 2025-11-11 04:44:14 (6964): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY 2025-11-11 04:44:14 (6964): Guest Log: [INFO] 2.7.2.0 http://s1ral-cvmfs.openhtc.io http://192.168.1.15:3128 2025-11-11 04:44:14 (6964): Guest Log: [INFO] Environment HTTP proxy: http://treacle:3128 2025-11-11 04:44:14 (6964): Guest Log: [INFO] Reading volunteer information 2025-11-11 04:44:16 (6964): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2025-11-11 04:44:17 (6964): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2025-11-11 04:44:47 (6964): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2025-11-11 04:44:47 (6964): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2025-11-11 04:45:17 (6964): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2025-11-11 04:45:18 (6964): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2025-11-11 04:45:48 (6964): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2025-11-11 04:45:48 (6964): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2025-11-11 04:46:18 (6964): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2025-11-11 04:46:18 (6964): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2025-11-11 04:46:49 (6964): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2025-11-11 04:46:49 (6964): Guest Log: [INFO] Requesting an X509 credential from vLHC@home-dev 2025-11-11 04:47:19 (6964): Guest Log: [DEBUG] % Total % Received % Xferd Average Speed Time Time Time Current 2025-11-11 04:47:19 (6964): Guest Log: Dload Upload Total Spent Left Speed 2025-11-11 04:47:19 (6964): Guest Log: 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 2025-11-11 04:47:19 (6964): Guest Log: 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 2025-11-11 04:47:19 (6964): Guest Log: curl: (56) Received HTTP code 403 from proxy after CONNECT 2025-11-11 04:47:20 (6964): Guest Log: [DEBUG] 2025-11-11 04:47:20 (6964): Guest Log: ERROR: Couldn't find a valid proxy. 2025-11-11 04:47:20 (6964): Guest Log: globus_sysconfig: File has zero length: File: /tmp/x509up_u0 2025-11-11 04:47:20 (6964): Guest Log: Use -debug for further information. 2025-11-11 04:47:20 (6964): Guest Log: [ERROR] Could not get an x509 credential 2025-11-11 04:47:20 (6964): Guest Log: [ERROR] The x509 proxy creation failed. Getting quite a few of them again over the last few days. FWIW: The connection check to HTCondor CCB is passing. Theory tasks seem to be working fine and even had one CMS task get to completion, amongst the failures. ID: 52650 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657	Message 52651 - Posted: 11 Nov 2025, 9:48:24 UTC - in response to Message 52650. Thanks for the report. I've pinged Laurence to alert him. ID: 52651 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657	Message 52675 - Posted: 18 Nov 2025, 14:39:26 UTC We've had a lot of jobs fail this morning. From the logs it seems to be a network problem. I suspect it's the Cloudflare outage. ID: 52675 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657	Message 52676 - Posted: 18 Nov 2025, 15:01:08 UTC - in response to Message 52675. Last modified: 18 Nov 2025, 15:42:06 UTC We've had a lot of jobs fail this morning. From the logs it seems to be a network problem. I suspect it's the Cloudflare outage. We seem to be recovering now, according to the Running Jobs graph. ID: 52676 · Reply Quote

Hellven Send message Joined: 20 Sep 08 Posts: 1 Credit: 7,614,873 RAC: 95	Message 52761 - Posted: 18 Dec 2025, 15:43:49 UTC Last modified: 18 Dec 2025, 15:47:00 UTC 431067740 Name: CMS_1226716_1766045429.113427_0 Exit status 93 (0x0000005D) Unknown error code Stderr output <core_client_version>8.2.8</core_client_version> <![CDATA[ <message> process exited with code 93 (0x5d, -163)</message> <stderr_txt> 2025-12-18 14:42:47 (28086): vboxwrapper version 26210 2025-12-18 14:42:47 (28086): BOINC client version: 8.2.8 2025-12-18 14:42:48 (28086): Detected: VirtualBox VboxManage Interface (Version: 7.2.4) 2025-12-18 14:42:48 (28086): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds) 2025-12-18 14:42:48 (28086): Successfully copied 'init_data.xml' to the shared directory. 2025-12-18 14:42:48 (28086): Create VM. (boinc_8cca58b587624ef5, slot#16) 2025-12-18 14:42:48 (28086): Setting Memory Size for VM. (4584MB) 2025-12-18 14:42:49 (28086): Setting CPU Count for VM. (4) 2025-12-18 14:42:49 (28086): Setting Chipset Options for VM. 2025-12-18 14:42:49 (28086): Setting Graphics Controller Options for VM. (Driver: VBoxVGA, 16MB) 2025-12-18 14:42:49 (28086): Setting Boot Options for VM. 2025-12-18 14:42:49 (28086): Setting Network Configuration for NAT. (Driver: virtio) 2025-12-18 14:42:50 (28086): Enabling VM Network Access. 2025-12-18 14:42:50 (28086): Disabling USB Support for VM. 2025-12-18 14:42:50 (28086): Disabling COM Port Support for VM. 2025-12-18 14:42:50 (28086): Disabling LPT Port Support for VM. 2025-12-18 14:42:50 (28086): Disabling Audio Support for VM. 2025-12-18 14:42:50 (28086): Disabling Clipboard Support for VM. 2025-12-18 14:42:50 (28086): Disabling Drag and Drop Support for VM. 2025-12-18 14:42:51 (28086): Adding storage controller(s) to VM. 2025-12-18 14:42:51 (28086): Adding virtual disk drive to VM. (CMS_2025_04_08_prod.vdi) 2025-12-18 14:42:52 (28086): Adding VirtualBox Guest Additions to VM. 2025-12-18 14:42:52 (28086): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 2025-12-18 14:42:52 (28086): forwarding host port 49699 to guest port 80 2025-12-18 14:42:52 (28086): Enabling remote desktop for VM. 2025-12-18 14:42:53 (28086): Enabling shared directory for VM. 2025-12-18 14:42:53 (28086): Starting VM using VBoxManage interface. (boinc_8cca58b587624ef5, slot#16) 2025-12-18 14:42:55 (28086): Successfully started VM. (PID = '28925') 2025-12-18 14:42:55 (28086): Reporting VM Process ID to BOINC. 2025-12-18 14:42:55 (28086): Guest Log: BIOS: VirtualBox 7.2.4 2025-12-18 14:42:55 (28086): Guest Log: CPUID EDX: 0x178bfbff 2025-12-18 14:42:55 (28086): Guest Log: BIOS: No PCI IDE controller, not probing IDE 2025-12-18 14:42:55 (28086): Guest Log: BIOS: AHCI 0-P#0: PCHS=16383/16/63 LCHS=1024/255/63 0x0000000002800000 sectors 2025-12-18 14:42:55 (28086): VM state change detected. (old = 'poweredoff', new = 'running') 2025-12-18 14:42:55 (28086): Detected: Web Application Enabled (http://localhost:49699) 2025-12-18 14:42:55 (28086): Detected: Remote Desktop Enabled (localhost:46879) 2025-12-18 14:42:55 (28086): Preference change detected 2025-12-18 14:42:55 (28086): Setting CPU throttle for VM. (100%) 2025-12-18 14:42:55 (28086): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds)) 2025-12-18 14:42:57 (28086): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2025-12-18 14:42:57 (28086): Guest Log: BIOS: Booting from Hard Disk... 2025-12-18 14:43:00 (28086): Guest Log: BIOS: KBD: unsupported int 16h function 03 2025-12-18 14:43:00 (28086): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2025-12-18 14:43:28 (28086): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds 2025-12-18 14:43:28 (28086): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2025-12-18 14:43:29 (28086): Guest Log: VBoxService 5.2.6 r120293 (verbosity: 0) linux.amd64 (Jan 15 2018 14:51:00) release log 2025-12-18 14:43:29 (28086): Guest Log: 00:00:00.000359 main Log opened 2025-12-18T11:43:29.499730000Z 2025-12-18 14:43:29 (28086): Guest Log: 00:00:00.001206 main OS Product: Linux 2025-12-18 14:43:29 (28086): Guest Log: 00:00:00.001296 main OS Release: 4.14.232-19.cernvm.x86_64 2025-12-18 14:43:29 (28086): Guest Log: 00:00:00.001358 main OS Version: #1 SMP Fri Apr 30 17:12:25 CEST 2021 2025-12-18 14:43:29 (28086): Guest Log: 00:00:00.001462 main Executable: /usr/sbin/VBoxService 2025-12-18 14:43:29 (28086): Guest Log: 00:00:00.001465 main Process ID: 2290 2025-12-18 14:43:29 (28086): Guest Log: 00:00:00.001466 main Package type: LINUX_64BITS_GENERIC 2025-12-18 14:43:29 (28086): Guest Log: 00:00:00.004587 main 5.2.6 r120293 started. Verbose level = 0 2025-12-18 14:49:31 (28086): VM Heartbeat file specified, but missing. 2025-12-18 14:49:31 (28086): Powering off VM. 2025-12-18 14:49:31 (28086): Successfully stopped VM. 2025-12-18 14:49:31 (28086): Deregistering VM. (boinc_8cca58b587624ef5, slot#16) 2025-12-18 14:49:31 (28086): Removing network bandwidth throttle group from VM. 2025-12-18 14:49:32 (28086): Removing VM from VirtualBox. ID: 52761 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1967 Credit: 159,362,239 RAC: 45,189	Message 52771 - Posted: 21 Dec 2025, 19:51:31 UTC since early evening, no jobs are available, although tasks are still being distributed, and each task fails after about half an hour. Obviously, the automatic stop function for task distribution in case of no jobs is again not working the way it's supposed to :-( ID: 52771 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657	Message 52772 - Posted: 21 Dec 2025, 22:16:01 UTC - in response to Message 52771. since early evening, no jobs are available, although tasks are still being distributed, and each task fails after about half an hour. Obviously, the automatic stop function for task distribution in case of no jobs is again not working the way it's supposed to :-( Yes, I just saw that. Not an ideal time for it to happen. At first glance I can't see anything wrong from home, but I have limited network access here. I have a hospital appointment tomorrow, so I won't be in my office before about 1100 (12 hours from now). More if/when I discover anything. ID: 52772 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1291 Credit: 95,276,708 RAC: 34,055	Message 52773 - Posted: 22 Dec 2025, 6:32:25 UTC the -dev version is also having problems again ID: 52773 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657	Message 52775 - Posted: 22 Dec 2025, 13:19:12 UTC - in response to Message 52773. the -dev version is also having problems again I imagine it would. It uses the same queues as the main project. It must be some sort of authorisation problem again. BOINC sees that jobs are available, so it creates new tasks; hosts obtain a task and contact the condor server to say they are available; server allocates a job to the task, but the job fails to start; task times out; new VM starts and the cycle repeats. I'm not sure that anyone who can look into this is still active at CERN, given the proximity to Christmas. I've sent alerts to those who could investigate the condor server logs, but no response yet. ID: 52775 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1967 Credit: 159,362,239 RAC: 45,189	Message 52776 - Posted: 22 Dec 2025, 13:56:41 UTC - in response to Message 52775. Ivan, many thanks for your efforts; so let's hope that CMS will run again very soon; if not, I am afraid it will take until first or even second week next year ID: 52776 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 657	Message 52777 - Posted: 22 Dec 2025, 14:44:14 UTC - in response to Message 52776. Well, thanks everyone for your patience this year. It's been frustrating for me, being forced into retirement and gradually losing my accounts and access while other things crumble as well. I'm not particularly looking forward to next year, but I'll try to soldier on for a while. ID: 52777 · Reply Quote

FanzaFede Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 19 Jul 18 Posts: 6 Credit: 338,972 RAC: 0	Message 52778 - Posted: 22 Dec 2025, 15:25:14 UTC Dear all, do you have the error file from your VM with the last error message? Is the issue related to "VM Heartbeat file specified, but missing." for all of you? Thanks, cheers Federica ID: 52778 · Reply Quote