Thread 'CMS@Home -- jobs update'

Author	Message
ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1161 Credit: 11,906,771 RAC: 6,954	Message 51300 - Posted: 18 Dec 2024, 14:48:05 UTC Sorry for the long delay, I'd been hoping for good news... As you've no doubt noticed, we haven't had any CMS@Home jobs for some time -- tasks sit idle until they time out. It looks like we have finally tracked down the reason. Somehow in starting up the Condor server, after it had been updated to run Alma9 Linux, the firewall rules get corrupted meaning that port 546 is blocked, so connection to the DHCPv6 server is disabled and IPv6 communication stops working. Now that the problem is identified, the responsible experts are working on a fix. I don't have an estimate on when it will be running again, but hopefully before Christmas! :-) ID: 51300 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1559 Credit: 10,101,976 RAC: 760	Message 51301 - Posted: 18 Dec 2024, 15:40:51 UTC - in response to Message 51300. Thanks Ivan for letting us know! ID: 51301 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1318 Credit: 98,165,279 RAC: 95,913	Message 51302 - Posted: 18 Dec 2024, 20:33:08 UTC - in response to Message 51300. Ivan Claus ID: 51302 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 809 Credit: 66,153,660 RAC: 23,088	Message 51310 - Posted: 20 Dec 2024, 12:47:27 UTC New Boinc tasks available but they don't get any new jobs. stderr gives: <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> The global filename characters, * or ?, are entered incorrectly or too many global filename characters are specified. (0xd0) - exit code 208 (0xd0)</message> <stderr_txt> And further down: 2024-12-20 14:15:15 (200056): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2024-12-20 14:15:15 (200056): Guest Log: [INFO] Requesting an idtoken from LHC@home 2024-12-20 14:15:16 (200056): Guest Log: [INFO] CMS application starting. Check log files. 2024-12-20 14:34:24 (200056): Guest Log: [ERROR] glidein exited with return value 1. ID: 51310 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1989 Credit: 162,628,564 RAC: 83,596	Message 51311 - Posted: 20 Dec 2024, 12:49:50 UTC - in response to Message 51310. same here :-( when I saw new tasks being available late morning, I was confident the the problem of no jobs was finally solved (for what other reasons would tasks have been made available again?). However, no luck - same problem as before :-( ID: 51311 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1989 Credit: 162,628,564 RAC: 83,596	Message 51313 - Posted: 20 Dec 2024, 13:34:57 UTC - in response to Message 51311. - same problem as before :-( well, not quite "same problem as before" - whereas BEFORE, the tasks ran for about 30 minutes, then got finished and yielded a small amount of credit, NOW they stop after about 22 minutes with "computation error", and no credit. So there is a slight difference to what the situation was before. ID: 51313 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 809 Credit: 66,153,660 RAC: 23,088	Message 51314 - Posted: 20 Dec 2024, 13:37:10 UTC - in response to Message 51311. same here :-( when I saw new tasks being available late morning, I was confident the the problem of no jobs was finally solved (for what other reasons would tasks have been made available again?). However, no luck - same problem as before :-( Not quite the same as before, now it gives an error also in Boinc. So not even minimal credit gets awarded. ID: 51314 · Reply Quote

Matthias Lehmkuhl Send message Joined: 15 Jul 05 Posts: 27 Credit: 2,675,621 RAC: 0	Message 51315 - Posted: 20 Dec 2024, 14:20:40 UTC looks like I've the same error 2024-12-20 14:24:46 (12684): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK 2024-12-20 14:24:49 (12684): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK 2024-12-20 14:24:49 (12684): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK 2024-12-20 14:24:50 (12684): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK 2024-12-20 14:24:51 (12684): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK 2024-12-20 14:24:52 (12684): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY 2024-12-20 14:24:52 (12684): Guest Log: [INFO] 2.7.2.0 http://s1fnal-cvmfs.openhtc.io:8080 DIRECT 2024-12-20 14:24:52 (12684): Guest Log: [INFO] Environment HTTP proxy: not set 2024-12-20 14:24:53 (12684): Guest Log: [INFO] Reading volunteer information 2024-12-20 14:25:25 (12684): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2024-12-20 14:25:26 (12684): Guest Log: [INFO] Requesting an idtoken from LHC@home 2024-12-20 14:25:27 (12684): Guest Log: [INFO] CMS application starting. Check log files. 2024-12-20 14:45:34 (12684): Guest Log: [ERROR] glidein exited with return value 1. Matthias ID: 51315 · Reply Quote

rilian Send message Joined: 12 Jul 08 Posts: 23 Credit: 941,384 RAC: 18	Message 51317 - Posted: 21 Dec 2024, 3:34:20 UTC - in response to Message 51315. Same error here ... I crunch for Ukraine ID: 51317 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1989 Credit: 162,628,564 RAC: 83,596	Message 51318 - Posted: 21 Dec 2024, 7:55:43 UTC Ivan - why do you send out tasks as long as no jobs are coming in ? ID: 51318 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2760 Credit: 304,854,117 RAC: 108,769	Message 51319 - Posted: 21 Dec 2024, 8:56:06 UTC It is obvious that tasks must be sent out to get the error(s) in the process chain located. Unfortunately it can't be restricted to computers run by the developers. Hence, until the issues are solved - best would be to uncheck CMS in the prefs and wait for a go in the forum - do not run a full buffer of envelope tasks (the short ones running around 0.5 h with very few CPU usage, even if they claim to be valid) - if you want to do some tests, run only a handful of tasks spread over the whole day ID: 51319 · Reply Quote

PekkaH Send message Joined: 23 Dec 19 Posts: 18 Credit: 59,897,058 RAC: 19,940	Message 51320 - Posted: 21 Dec 2024, 9:36:12 UTC - in response to Message 51319. I've investigated a bit on how to limit CMS jobs as they are failing. The boinc prefs, unticking CMS, had no effect in my case. I run Win11, Win10, Ubuntu24.04 and Ubuntu22.04 machines, Virtualbox in each. I am now trying an alternative. I have created app_config for CMS and defined there mac_concurrency "1" which seems to have an effect. Hence, most of cpu is used for something purposeful and still contributing to project (at least pile up error logs). Br Pekka ID: 51320 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2760 Credit: 304,854,117 RAC: 108,769	Message 51321 - Posted: 21 Dec 2024, 9:51:16 UTC - in response to Message 51320. The boinc prefs, unticking CMS, had no effect in my case. Most likely you have enabled this at your prefs page: "If no work for selected applications is available, accept work from other applications?" Disable this and disable all apps you don't want to run. ID: 51321 · Reply Quote

PekkaH Send message Joined: 23 Dec 19 Posts: 18 Credit: 59,897,058 RAC: 19,940	Message 51322 - Posted: 21 Dec 2024, 14:09:50 UTC - in response to Message 51321. The boinc prefs, unticking CMS, had no effect in my case. Most likely you have enabled this at your prefs page: "If no work for selected applications is available, accept work from other applications?" Disable this and disable all apps you don't want to run. Correct, I missed that tab. ID: 51322 · Reply Quote

Pascal Send message Joined: 13 May 20 Posts: 64 Credit: 3,170,844 RAC: 2,450	Message 51323 - Posted: 22 Dec 2024, 12:59:03 UTC bonjour pas loin de 200 taches csm avec code d'erreur 208. hello not far from 200 csm spots with error code 208. ID: 51323 · Reply Quote

JZD Send message Joined: 31 Dec 11 Posts: 2 Credit: 10,430,793 RAC: 2,866	Message 51330 - Posted: 26 Dec 2024, 11:59:22 UTC Aim failed task code 208 https://lhcathome.cern.ch/lhcathome/result.php?resultid=418488241. :-( ID: 51330 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2760 Credit: 304,854,117 RAC: 108,769	Message 51356 - Posted: 6 Jan 2025, 21:00:15 UTC Looks like CMS sends out jobs again since this afternoon. Cheers and happy crunching. ID: 51356 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1318 Credit: 98,165,279 RAC: 95,913	Message 51357 - Posted: 6 Jan 2025, 22:45:47 UTC I have some running at -dev but that site needs to be poked with a stick since it is just blank pages for everything so we can't see what is going on. ID: 51357 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 951 Credit: 784,825,744 RAC: 121,239	Message 51358 - Posted: 7 Jan 2025, 8:56:35 UTC Seem like they might be fixed too. https://lhcathome.cern.ch/lhcathome/result.php?resultid=418692627 https://lhcathome.cern.ch/lhcathome/result.php?resultid=418692466 https://lhcathome.cern.ch/lhcathome/result.php?resultid=418691951 ID: 51358 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1161 Credit: 11,906,771 RAC: 6,954	Message 51359 - Posted: 7 Jan 2025, 10:43:54 UTC - in response to Message 51356. Looks like CMS sends out jobs again since this afternoon. Cheers and happy crunching. Yes, it took a while to get all our ducks in a row, what with the long holiday break (my uni was closed for 16 days!). However, it looks like Laurence got the rght glide-in wrappers and id-tokens installed yesterday, with a bit of help from Federica, and things are starting to take off again. I notice there's one user with a high number of failures to access the Frontier servers (conditions database) -- that's usually a sign of a network misconfiguration on the client site. ID: 51359 · Reply Quote