Message boards : CMS Application : CMS@Home -- jobs update
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,438,578
RAC: 8,367
Message 51300 - Posted: 18 Dec 2024, 14:48:05 UTC

Sorry for the long delay, I'd been hoping for good news...
As you've no doubt noticed, we haven't had any CMS@Home jobs for some time -- tasks sit idle until they time out.
It looks like we have finally tracked down the reason. Somehow in starting up the Condor server, after it had been updated to run Alma9 Linux, the firewall rules get corrupted meaning that port 546 is blocked, so connection to the DHCPv6 server is disabled and IPv6 communication stops working.
Now that the problem is identified, the responsible experts are working on a fix. I don't have an estimate on when it will be running again, but hopefully before Christmas! :-)
ID: 51300 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1461
Credit: 9,858,270
RAC: 2,555
Message 51301 - Posted: 18 Dec 2024, 15:40:51 UTC - in response to Message 51300.  

Thanks Ivan for letting us know!
ID: 51301 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1234
Credit: 79,764,930
RAC: 78,388
Message 51302 - Posted: 18 Dec 2024, 20:33:08 UTC - in response to Message 51300.  


Ivan Claus
ID: 51302 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 780
Credit: 59,969,791
RAC: 47,598
Message 51310 - Posted: 20 Dec 2024, 12:47:27 UTC

New Boinc tasks available but they don't get any new jobs.
stderr gives:
<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
The global filename characters, * or ?, are entered incorrectly or too many global filename characters are specified.
 (0xd0) - exit code 208 (0xd0)</message>
<stderr_txt>

And further down:
2024-12-20 14:15:15 (200056): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2024-12-20 14:15:15 (200056): Guest Log: [INFO] Requesting an idtoken from LHC@home
2024-12-20 14:15:16 (200056): Guest Log: [INFO] CMS application starting. Check log files.
2024-12-20 14:34:24 (200056): Guest Log: [ERROR] glidein exited with return value 1.

ID: 51310 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1908
Credit: 144,905,236
RAC: 83,880
Message 51311 - Posted: 20 Dec 2024, 12:49:50 UTC - in response to Message 51310.  

same here :-(

when I saw new tasks being available late morning, I was confident the the problem of no jobs was finally solved (for what other reasons would tasks have been made available again?).

However, no luck - same problem as before :-(
ID: 51311 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1908
Credit: 144,905,236
RAC: 83,880
Message 51313 - Posted: 20 Dec 2024, 13:34:57 UTC - in response to Message 51311.  

- same problem as before :-(
well, not quite "same problem as before" - whereas BEFORE, the tasks ran for about 30 minutes, then got finished and yielded a small amount of credit, NOW they stop after about 22 minutes with "computation error", and no credit. So there is a slight difference to what the situation was before.
ID: 51313 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 780
Credit: 59,969,791
RAC: 47,598
Message 51314 - Posted: 20 Dec 2024, 13:37:10 UTC - in response to Message 51311.  

same here :-(

when I saw new tasks being available late morning, I was confident the the problem of no jobs was finally solved (for what other reasons would tasks have been made available again?).

However, no luck - same problem as before :-(

Not quite the same as before, now it gives an error also in Boinc. So not even minimal credit gets awarded.
ID: 51314 · Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 15 Jul 05
Posts: 27
Credit: 2,624,744
RAC: 2,869
Message 51315 - Posted: 20 Dec 2024, 14:20:40 UTC

looks like I've the same error
2024-12-20 14:24:46 (12684): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK
2024-12-20 14:24:49 (12684): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK
2024-12-20 14:24:49 (12684): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK
2024-12-20 14:24:50 (12684): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK
2024-12-20 14:24:51 (12684): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK
2024-12-20 14:24:52 (12684): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY
2024-12-20 14:24:52 (12684): Guest Log: [INFO] 2.7.2.0 http://s1fnal-cvmfs.openhtc.io:8080 DIRECT
2024-12-20 14:24:52 (12684): Guest Log: [INFO] Environment HTTP proxy: not set
2024-12-20 14:24:53 (12684): Guest Log: [INFO] Reading volunteer information
2024-12-20 14:25:25 (12684): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2024-12-20 14:25:26 (12684): Guest Log: [INFO] Requesting an idtoken from LHC@home
2024-12-20 14:25:27 (12684): Guest Log: [INFO] CMS application starting. Check log files.
2024-12-20 14:45:34 (12684): Guest Log: [ERROR] glidein exited with return value 1.
Matthias

ID: 51315 · Report as offensive     Reply Quote
Profile rilian
Avatar

Send message
Joined: 12 Jul 08
Posts: 21
Credit: 654,505
RAC: 7,387
Message 51317 - Posted: 21 Dec 2024, 3:34:20 UTC - in response to Message 51315.  

Same error here ...
I crunch for Ukraine
ID: 51317 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1908
Credit: 144,905,236
RAC: 83,880
Message 51318 - Posted: 21 Dec 2024, 7:55:43 UTC

Ivan - why do you send out tasks as long as no jobs are coming in ?
ID: 51318 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2683
Credit: 286,878,464
RAC: 57,467
Message 51319 - Posted: 21 Dec 2024, 8:56:06 UTC

It is obvious that tasks must be sent out to get the error(s) in the process chain located.
Unfortunately it can't be restricted to computers run by the developers.

Hence, until the issues are solved
- best would be to uncheck CMS in the prefs and wait for a go in the forum
- do not run a full buffer of envelope tasks (the short ones running around 0.5 h with very few CPU usage, even if they claim to be valid)
- if you want to do some tests, run only a handful of tasks spread over the whole day
ID: 51319 · Report as offensive     Reply Quote
PekkaH

Send message
Joined: 23 Dec 19
Posts: 18
Credit: 52,955,566
RAC: 33,900
Message 51320 - Posted: 21 Dec 2024, 9:36:12 UTC - in response to Message 51319.  

I've investigated a bit on how to limit CMS jobs as they are failing.

The boinc prefs, unticking CMS, had no effect in my case. I run Win11, Win10, Ubuntu24.04 and Ubuntu22.04 machines, Virtualbox in each.
I am now trying an alternative. I have created app_config for CMS and defined there mac_concurrency "1" which seems to have an effect.

Hence, most of cpu is used for something purposeful and still contributing to project (at least pile up error logs).

Br Pekka
ID: 51320 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2683
Credit: 286,878,464
RAC: 57,467
Message 51321 - Posted: 21 Dec 2024, 9:51:16 UTC - in response to Message 51320.  

The boinc prefs, unticking CMS, had no effect in my case.

Most likely you have enabled this at your prefs page:
"If no work for selected applications is available, accept work from other applications?"
Disable this and disable all apps you don't want to run.
ID: 51321 · Report as offensive     Reply Quote
PekkaH

Send message
Joined: 23 Dec 19
Posts: 18
Credit: 52,955,566
RAC: 33,900
Message 51322 - Posted: 21 Dec 2024, 14:09:50 UTC - in response to Message 51321.  

The boinc prefs, unticking CMS, had no effect in my case.

Most likely you have enabled this at your prefs page:
"If no work for selected applications is available, accept work from other applications?"
Disable this and disable all apps you don't want to run.


Correct, I missed that tab.
ID: 51322 · Report as offensive     Reply Quote
Pascal

Send message
Joined: 13 May 20
Posts: 51
Credit: 2,554,989
RAC: 7,300
Message 51323 - Posted: 22 Dec 2024, 12:59:03 UTC

bonjour
pas loin de 200 taches csm avec code d'erreur 208.

hello
not far from 200 csm spots with error code 208.
ID: 51323 · Report as offensive     Reply Quote
JZD

Send message
Joined: 31 Dec 11
Posts: 2
Credit: 10,291,956
RAC: 15,758
Message 51330 - Posted: 26 Dec 2024, 11:59:22 UTC

ID: 51330 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2683
Credit: 286,878,464
RAC: 57,467
Message 51356 - Posted: 6 Jan 2025, 21:00:15 UTC

Looks like CMS sends out jobs again since this afternoon.
Cheers and happy crunching.
ID: 51356 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1234
Credit: 79,764,930
RAC: 78,388
Message 51357 - Posted: 6 Jan 2025, 22:45:47 UTC

I have some running at -dev but that site needs to be poked with a stick since it is just blank pages for everything so we can't see what is going on.
ID: 51357 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 880
Credit: 746,708,115
RAC: 323,059
Message 51358 - Posted: 7 Jan 2025, 8:56:35 UTC

ID: 51358 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,438,578
RAC: 8,367
Message 51359 - Posted: 7 Jan 2025, 10:43:54 UTC - in response to Message 51356.  

Looks like CMS sends out jobs again since this afternoon.
Cheers and happy crunching.

Yes, it took a while to get all our ducks in a row, what with the long holiday break (my uni was closed for 16 days!). However, it looks like Laurence got the rght glide-in wrappers and id-tokens installed yesterday, with a bit of help from Federica, and things are starting to take off again.
I notice there's one user with a high number of failures to access the Frontier servers (conditions database) -- that's usually a sign of a network misconfiguration on the client site.
ID: 51359 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : CMS Application : CMS@Home -- jobs update


©2025 CERN