CMS tasks failing

Author	Message
ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1119 Credit: 10,397,050 RAC: 19,398	Message 52483 - Posted: 7 Oct 2025, 14:48:08 UTC - in response to Message 52482. for the past few hours, all CMS tasks on all of my hosts are failing after about 1/2 hour. Stderr says: https://lhcathome.cern.ch/lhcathome/result.php?resultid=427929038 Thanks for reporting that. I've noticed that the running jobs were falling off but don't yet see any reason why -- the WMAgent seems to be in good shape so I'm surmising a network error somewhere. ID: 52483 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1242 Credit: 85,097,796 RAC: 136,324	Message 52486 - Posted: 7 Oct 2025, 16:48:34 UTC I got about 100 of these since it was during the couple hours when I was asleep <core_client_version>8.2.4</core_client_version> <![CDATA[ <message> The global filename characters, * or ?, are entered incorrectly or too many global filename characters are specified. (0xd0) - exit code 208 (0xd0)</message> <stderr_txt> ID: 52486 · Reply Quote

Garrulus glandarius Send message Joined: 5 Apr 25 Posts: 57 Credit: 1,135,451 RAC: 4,205	Message 52488 - Posted: 7 Oct 2025, 20:33:54 UTC Last modified: 7 Oct 2025, 20:35:01 UTC I had 2 taks fail, luckily I saw the earlier post and blocked further CMS tasks. Both look like: Exit status 208 (0x000000D0) EXIT_SUB_TASK_FAILURE and 2025-10-07 21:51:09 (2841802): Guest Log: [INFO] CMS application starting. Check log files. 2025-10-07 22:11:46 (2841802): Guest Log: [ERROR] glidein exited with return value 1. 2025-10-07 22:11:46 (2841802): Guest Log: [DEBUG] Volunteer: Garrulus glandarius (2359357) 2025-10-07 22:11:46 (2841802): Guest Log: [INFO] Shutting Down. 2025-10-07 22:12:15 (2841802): VM Completion File Detected. 2025-10-07 22:12:15 (2841802): VM Completion Message: glidein exited with return value 1. ID: 52488 · Reply Quote

Aaron Send message Joined: 5 May 10 Posts: 10 Credit: 7,711,234 RAC: 51,342	Message 52489 - Posted: 8 Oct 2025, 1:59:40 UTC I just noticed this also. Have over 300 tasks in a row that failed the same way. Just suspended all tasks that haven't started yet until they fix it. ID: 52489 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1923 Credit: 149,493,193 RAC: 143,826	Message 52491 - Posted: 8 Oct 2025, 6:21:45 UTC - in response to Message 52483. Ivan wrote: Thanks for reporting that. I've noticed that the running jobs were falling off but don't yet see any reason why -- the WMAgent seems to be in good shape so I'm surmising a network error somewhere. good morning, Ivan - wouldn't it make sense to stop task distribution until the problem gets solved? ID: 52491 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1119 Credit: 10,397,050 RAC: 19,398	Message 52495 - Posted: 8 Oct 2025, 9:59:52 UTC - in response to Message 52491. Ivan wrote: Thanks for reporting that. I've noticed that the running jobs were falling off but don't yet see any reason why -- the WMAgent seems to be in good shape so I'm surmising a network error somewhere. good morning, Ivan - wouldn't it make sense to stop task distribution until the problem gets solved? Hmm, the problem there is that we can't debug if there are no tasks asking for jobs... I've alerted our HTCondor specialist to the problem but haven't heard back today; last night she couldn't see anything obvious. As far as I can tell we must be having some mismatch between the requirements specified by the tasks (i.e. the VM that wants jobs to run) and the requirements of the jobs that condor has available for distribution, but I'm no expert at querying the condor server. I'm now wondering if something has changed in the job submission infrastructure that we haven't been told about. I'll ask Laurence if he can limit the size of the task queue in the BOINC server until we have an answer. ID: 52495 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1119 Credit: 10,397,050 RAC: 19,398	Message 52496 - Posted: 8 Oct 2025, 12:21:32 UTC - in response to Message 52495. The task queue has been cut back to 25 instead of 200, but it will take some time for failures to whittle it down. ID: 52496 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1119 Credit: 10,397,050 RAC: 19,398	Message 52498 - Posted: 9 Oct 2025, 12:17:39 UTC The current problem seems to be that "new" VMs (i.e. tasks from a volunteer's point of view) are unable to join the HTCondor "pool" of available machines and thus they don't acquire jobs to run. The reason is still unclear... ID: 52498 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1923 Credit: 149,493,193 RAC: 143,826	Message 52499 - Posted: 9 Oct 2025, 12:21:42 UTC - in response to Message 52498. Last modified: 9 Oct 2025, 12:22:59 UTC The current problem seems to be that "new" VMs (i.e. tasks from a volunteer's point of view) are unable to join the HTCondor "pool" of available machines and thus they don't acquire jobs to run. The reason is still unclear... Ivan, thanks for the information. So let's keep our fingers crossed that the problem will be solved soon. P.S. tasks are still available for download, I think it would make sense to stop this. ID: 52499 · Reply Quote

[VENETO] boboviz Send message Joined: 7 May 08 Posts: 254 Credit: 2,002,646 RAC: 505	Message 52504 - Posted: 10 Oct 2025, 12:05:29 UTC Strange new message after 20/25 minutes of calculation: <message> The file name wildcard characters * or ? were entered incorrectly or too many were specified. (0xd0) - exit code 208 (0xd0)</message> <stderr_txt> ID: 52504 · Reply Quote

rilian Send message Joined: 12 Jul 08 Posts: 21 Credit: 720,918 RAC: 2,070	Message 52510 - Posted: 10 Oct 2025, 20:45:28 UTC - in response to Message 52496. The task queue has been cut back to 25 instead of 200, but it will take some time for failures to whittle it down. i'm still getting [ERROR] glidein exited with return value 1. should we keep CMS enabled to help process this queue, or pause it for now ? It takes about 20m for the task before it fails on my computer I crunch for Ukraine ID: 52510 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1923 Credit: 149,493,193 RAC: 143,826	Message 52519 - Posted: 13 Oct 2025, 11:52:37 UTC Ivan, any idea when CMS will be working again? ID: 52519 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2710 Credit: 292,064,277 RAC: 145,336	Message 52520 - Posted: 13 Oct 2025, 13:20:26 UTC - in response to Message 52504. Strange new message after 20/25 minutes of calculation: <message> The file name wildcard characters * or ? were entered incorrectly or too many were specified. (0xd0) - exit code 208 (0xd0)</message> <stderr_txt> This error text is caused by a mismatch between Windows and BOINC. BOINC reports #208 as "EXIT_SUB_TASK_FAILURE" while Windows expands it to the "wildcard" text. Not sure if Windows or BOINC was first to use that error number. The text can safely be ignored since the error number in connection with BOINC's final status clearly shows what happened. ID: 52520 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1119 Credit: 10,397,050 RAC: 19,398	Message 52528 - Posted: 14 Oct 2025, 13:31:20 UTC - in response to Message 52519. Ivan, any idea when CMS will be working again? Not sure, might know better after a meeting with Federica and Laurence just now. Principal problem has been corrected and tasks are starting, but they are not being assigned jobs in the condor server. More news soon, I hope. ID: 52528 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1119 Credit: 10,397,050 RAC: 19,398	Message 52538 - Posted: 15 Oct 2025, 13:20:45 UTC - in response to Message 52528. Ivan, any idea when CMS will be working again? Not sure, might know better after a meeting with Federica and Laurence just now. Principal problem has been corrected and tasks are starting, but they are not being assigned jobs in the condor server. More news soon, I hope. Well, we fixed one problem (someone decided to clean out a file-store that was filling up, and deleted some files that we still used), so now tasks are joning the pool again, but for some reason condor isn't matching them to available jobs. Debugging continues... ID: 52538 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1923 Credit: 149,493,193 RAC: 143,826	Message 52539 - Posted: 17 Oct 2025, 6:25:21 UTC - in response to Message 52538. ... so now tasks are joning the pool again, but for some reason condor isn't matching them to available jobs. Debugging continues... good morning Ivan, what's the status? Debugging still not successful so far? ID: 52539 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1469 Credit: 9,927,016 RAC: 1,807	Message 52541 - Posted: 17 Oct 2025, 12:49:34 UTC - in response to Message 52539. ... so now tasks are joning the pool again, but for some reason condor isn't matching them to available jobs. Debugging continues... good morning Ivan, what's the status? Debugging still not successful so far? VM-tasks are not failing, but do not process real internal jobs: https://lhcathome.cern.ch/lhcathome/result.php?resultid=429041036 ID: 52541 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1923 Credit: 149,493,193 RAC: 143,826	Message 52542 - Posted: 17 Oct 2025, 13:31:00 UTC - in response to Message 52541. VM-tasks are not failing, but do not process real internal jobs: https://lhcathome.cern.ch/lhcathome/result.php?resultid=429041036 yes - that's the problem; and I am afraid that it won't be solved until next week :-( ID: 52542 · Reply Quote

M0CZY Send message Joined: 27 Apr 24 Posts: 21 Credit: 1,253,898 RAC: 1,544	Message 52582 - Posted: 24 Oct 2025, 16:18:36 UTC - in response to Message 52542. I am quite happy to run these 'NCI' CMS work units. I'm racking up my credits, at almost no cost in processing power. ID: 52582 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1923 Credit: 149,493,193 RAC: 143,826	Message 52588 - Posted: 25 Oct 2025, 3:37:29 UTC - in response to Message 52582. I am quite happy to run these 'NCI' CMS work units. I'm racking up my credits, at almost no cost in processing power. you should be aware though that they are of no use for the science. ID: 52588 · Reply Quote

LHC@home