Message boards : CMS Application : tasks now running unusual long time without CPU usage
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1294
Credit: 8,545,773
RAC: 3,728
Message 49538 - Posted: 13 Feb 2024, 7:50:04 UTC

Third CMS-job inside current BOINC task: ======== WMAgent Run the job starting at Tue Feb 13 07:32:15 GMT 2024 ========
ID: 49538 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2125
Credit: 159,966,858
RAC: 38,727
Message 49540 - Posted: 13 Feb 2024, 11:07:06 UTC - in response to Message 49537.  
Last modified: 13 Feb 2024, 11:10:51 UTC

Since 4:20 UTC no new job inside the CMS-Task.

At this time ISP had made a reconnect for the VDSL,
After this, there was no new job started.
CMS running now to the 18 hour hard-stop.
ScratchDirFileCount = 1261
STARTER_DISCONNECTED_FROM_SUBMIT_NODE = true
StarterIpAddr = "<10.0.2.15:33111?CCBID=137.138.156.85:9618%3faddrs%3d[2001-1458-d00-14--b3]-9618+137.138.156.85-9618%26alias%3dvocms0840.cern.ch#12840713&PrivNet=75468-10795955-8340&addrs=10.0.2.15-33111&alias=75468-10795955-8340&noUDP>"
StatsLifetime = 5603
02/13/24 04:38:01 (pid:16912) CONFIGURATION PROBLEM: Failed to insert ClassAd attribute STARTD_JOB_ATTRS = x509userproxysubject x509UserProxyFQAN x509UserProxyVOName x509UserProxyEmail x509UserProxyExpiration,MemoryUsage,ResidentSetSize,ProportionalSetSizeKb. The most common reason for this is that you forgot to quote a string value in the list of attributes being added to the STARTD ad.
ID: 49540 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2435
Credit: 228,215,706
RAC: 122,717
Message 49541 - Posted: 13 Feb 2024, 11:48:42 UTC - in response to Message 49540.  

Log entries like this have been reported for more than 2 years.

There's no need to swamp the forum with posts about them now since
- they obviously never affected the scientific calculation
- they are obviously not related to the recent x509 issue which is solved
- they are only visible inside the VM when someone actively looks through the logs there (in this case "StartdLog")
- Ivan as part of the CMS team already mentioned he is aware of it and forwarded it to the developers


During a reconnect by your ISP your internet router usually (at least in Germany) gets a fresh public(!) IP v4 (and a fresh IP v6 prefix).
This does not affect the lifetime of any x509 cert created by CERN nor does it affect your LAN IPs nor CERN's IPs which are mentioned in the log message.
ID: 49541 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2125
Credit: 159,966,858
RAC: 38,727
Message 49542 - Posted: 13 Feb 2024, 11:55:51 UTC - in response to Message 49540.  

From my site, nothing was changed.
Realy seeing no new job since this time.
This is only an info and have nothing to do with the good work
from CMS Team including of course Ivan.
ID: 49542 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1693
Credit: 104,863,238
RAC: 71,692
Message 49543 - Posted: 13 Feb 2024, 12:18:43 UTC
Last modified: 13 Feb 2024, 12:36:53 UTC

within the past hour, my hosts have started producing tasks which finish after some 25 minutes.
Seems that no jobs are available again :-(

here an example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=406083580

Edit: the automatic stop of tasks distribution worked this time, with the server status page showing "0" unsent tasks :-)
ID: 49543 · Report as offensive     Reply Quote
FanzaFede
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 19 Jul 18
Posts: 3
Credit: 129,941
RAC: 73
Message 49546 - Posted: 13 Feb 2024, 17:50:33 UTC - in response to Message 49543.  

Hi Erich56, all,
we are changing scripts of CMS factory/frontend services to be able to run multicore jobs and we are finding some problems....
We are sorry about that, trying to fix as soon as possible.
Cheers,
Federica
ID: 49546 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1009
Credit: 6,291,692
RAC: 1,385
Message 49549 - Posted: 14 Feb 2024, 9:57:41 UTC - in response to Message 49543.  

within the past hour, my hosts have started producing tasks which finish after some 25 minutes.
Seems that no jobs are available again :-(

here an example: https://lhcathome.cern.ch/lhcathome/result.php?resultid=406083580

Edit: the automatic stop of tasks distribution worked this time, with the server status page showing "0" unsent tasks :-)

Yes, I caught up on that last night, missed it yesterday.
ID: 49549 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : CMS Application : tasks now running unusual long time without CPU usage


©2024 CERN