Message boards : CMS Application : Probable job interruption
Message board moderation

To post messages, you must log in.

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,381,594
RAC: 5,065
Message 52095 - Posted: 18 Aug 2025, 10:38:09 UTC

WMCore want to update our WMAgent, but the current workflow won't end for over a week, So, we'll have to force-stop the w/f which will mean running tasks will be lost. I'll try to give at least a day's notice of this, so be prepared to set No New Tasks when I have a deadline to work with.
ID: 52095 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,381,594
RAC: 5,065
Message 52099 - Posted: 20 Aug 2025, 13:14:18 UTC - in response to Message 52095.  

OK, I've told WMCore that they can shut down our workflow for the upgrade anytime after 1200 tomorrow (CERN time). Please set No New Tasks ASAP.
I'll let you know when it's safe to go back in the water again.
ID: 52099 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,381,594
RAC: 5,065
Message 52105 - Posted: 22 Aug 2025, 14:51:04 UTC - in response to Message 52099.  

No action yet from WMCore, and many people are still running tasks, it seems, This workflow will probably finish over the weekend. I'll start injecting smaller workflows then, so as not to have too much interruption when the intervention does finally occur.
ID: 52105 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1234
Credit: 79,650,549
RAC: 96,588
Message 52139 - Posted: 29 Aug 2025, 9:11:36 UTC
Last modified: 29 Aug 2025, 9:49:24 UTC

x
ID: 52139 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,381,594
RAC: 5,065
Message 52176 - Posted: 3 Sep 2025, 14:39:56 UTC - in response to Message 52105.  

No action yet from WMCore, and many people are still running tasks, it seems, This workflow will probably finish over the weekend. I'll start injecting smaller workflows then, so as not to have too much interruption when the intervention does finally occur.

It looks like this will finally happen tomorrow night (CET). I'm letting the queues drain.
ID: 52176 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1908
Credit: 144,550,824
RAC: 76,453
Message 52188 - Posted: 5 Sep 2025, 5:32:26 UTC - in response to Message 52176.  

CMS tasks can be downloaded again, but they fail after a few minutes, with error

2025-09-05 07:23:30 (11428): Guest Log: ERROR: Couldn't read proxy from: /tmp/x509up_u0
2025-09-05 07:23:30 (11428): Guest Log: globus_credential: Error reading proxy credential
2025-09-05 07:23:30 (11428): Guest Log: globus_credential: Error reading proxy credential: Couldn't read PEM from bio
2025-09-05 07:23:30 (11428): Guest Log: OpenSSL Error: pem_lib.c:707: in library: PEM routines, function PEM_read_bio: no start line
ID: 52188 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1234
Credit: 79,650,549
RAC: 96,588
Message 52190 - Posted: 5 Sep 2025, 6:52:11 UTC - in response to Message 52188.  
Last modified: 5 Sep 2025, 6:52:21 UTC

Mine are all working for the last 3 hours Erich
ID: 52190 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1908
Credit: 144,550,824
RAC: 76,453
Message 52191 - Posted: 5 Sep 2025, 7:28:12 UTC
Last modified: 5 Sep 2025, 7:35:58 UTC

that's really strange; just now, I tried again, same negative result:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=426268354

"Error reading proxy credential"

what's going wrong since they were fiddling around with the WMAgent ?
(BTW: Just to be on the safe side: for another test, I switched off my own squid proxy, so the task got connected directly to the internet - same negative result)

Edit: just tried a Theory task: seems to work well
ID: 52191 · Report as offensive     Reply Quote
Klaus

Send message
Joined: 27 Aug 15
Posts: 28
Credit: 23,520,829
RAC: 20,789
Message 52192 - Posted: 5 Sep 2025, 8:21:49 UTC

Same problem as by Erich56: all CMS tasks are failing, Theory works
ID: 52192 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1908
Credit: 144,550,824
RAC: 76,453
Message 52193 - Posted: 5 Sep 2025, 8:41:56 UTC - in response to Message 52192.  

Same problem as by Erich56: all CMS tasks are failing, Theory works
interesting, that CMS works with some volunteers (like Magic Quantum Mechanic) and does NOT work with some others :-(
ID: 52193 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,381,594
RAC: 5,065
Message 52195 - Posted: 5 Sep 2025, 9:26:18 UTC - in response to Message 52193.  

Same problem as by Erich56: all CMS tasks are failing, Theory works
interesting, that CMS works with some volunteers (like Magic Quantum Mechanic) and does NOT work with some others :-(

Yes, a bit of a head-scratcher. I'm reaching out to People Who Should Know(®).
ID: 52195 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1908
Credit: 144,550,824
RAC: 76,453
Message 52196 - Posted: 5 Sep 2025, 9:38:19 UTC - in response to Message 52195.  

I'm reaching out to People Who Should Know(®).
many thanks, Ivan :-)
ID: 52196 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1234
Credit: 79,650,549
RAC: 96,588
Message 52198 - Posted: 5 Sep 2025, 23:41:33 UTC
Last modified: 5 Sep 2025, 23:43:17 UTC

Well all of mine were working and almost 100 Valids BUT as soon as I am not watching now I am getting that same thing again (same thing with the -dev version)

https://lhcathome.cern.ch/lhcathome/result.php?resultid=426291526
ID: 52198 · Report as offensive     Reply Quote

Message boards : CMS Application : Probable job interruption


©2025 CERN