Message boards : CMS Application : CMS jobs are becoming available again
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 821
Credit: 5,717,880
RAC: 201
Message 38352 - Posted: 20 Mar 2019, 21:38:02 UTC - in response to Message 38340.  

The CMS jobs graphs are failing both here and at dev.

Yes, I noticed that (they are in essence the same graphs, just presenting the data in different categories). Can't see why yet.

Back up again; perhaps a major contributor had some down-time. I'm supplying nearly 40 job slots myself at the moment, so if my Uni gets cut off (one of our two redundant feeds has been back-hoed this week) you would see a similar dip in the graph.
ID: 38352 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1865
Credit: 128,173,766
RAC: 81,015
Message 38354 - Posted: 20 Mar 2019, 22:26:14 UTC

This is mainly caused by CMS


Cache information for squid:
Today, 0:00 - 22:40
Statistics include uncacheable traffic like uploads and via special ports, e.g. 443 or 9618 (WMAgent).


Requests served: 1,546,134
Bytes served: 104.48 GB

and there's still 1 h to go.

Average HTTP requests per minute since start: 800.6 (restarted last Sunday evening)

Hits as % of all requests: 5min: 90.8%, 60min: 94.2%
Hits as % of bytes sent: 5min: 72.3%, 60min: 40.2%
Memory hits as % of hit requests: 5min: 98.7%, 60min: 94.9%
ID: 38354 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 821
Credit: 5,717,880
RAC: 201
Message 38375 - Posted: 22 Mar 2019, 11:16:26 UTC

I've just heard from Laurence; expect fixes for many of our latest problems Real Soon Now.
Thanks for your patience.
ID: 38375 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 821
Credit: 5,717,880
RAC: 201
Message 38378 - Posted: 22 Mar 2019, 16:56:00 UTC - in response to Message 38375.  
Last modified: 22 Mar 2019, 19:19:02 UTC

Well, the output consoles are working again, but there are a couple of buglets. The system console (Alt-F1) is displaying a grep error from time to time, which has the hallmarks of a typo. Also the Finished_nnn.Log in the Web interface is writing a file with new nnn several times per minute rather than overwriting the old file -- I guess the index increment got put inside the wrong loop.The responsibles have been informed...

[Edit] Hmm, I might have just been unlucky and had the task start when the configuration was not fully changed. Two others that have started since don't show the symptoms. [/Edit]

[Edit2] It's been confirmed that there was a bad version of the patches "in the wild" for about 30 minutes, so I was unlucky enough to catch it. [/Edit2]
ID: 38378 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1055
Credit: 6,653,060
RAC: 1,904
Message 38385 - Posted: 23 Mar 2019, 7:08:52 UTC - in response to Message 38378.  

Well, the output consoles are working again, but there are a couple of buglets.

The output to Console ALT-F2 (events processing) was working at first, but now the output is killed by a typing failure in the script of that output directory:

/vr/lib/condor/execute/dir_nnnn etcetera what should have been /var/lib/condor/execute/dir_nnnnn etcetera
ID: 38385 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 821
Credit: 5,717,880
RAC: 201
Message 38387 - Posted: 23 Mar 2019, 11:21:11 UTC - in response to Message 38385.  

Fat_Fingers-R-Us :-(
ID: 38387 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1491
Credit: 37,805,249
RAC: 42,062
Message 38753 - Posted: 9 May 2019, 10:05:13 UTC - in response to Message 38208.  

However, something seems to be strange (not to say "wrong") with the credit points: CMS tasks earn only about a third of what is earned for Theory tasks. How come?
I am still curious what is the reason for this discrepancy. Any logical explanation(s) ?
ID: 38753 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 821
Credit: 5,717,880
RAC: 201
Message 38760 - Posted: 10 May 2019, 8:09:28 UTC - in response to Message 38753.  

However, something seems to be strange (not to say "wrong") with the credit points: CMS tasks earn only about a third of what is earned for Theory tasks. How come?
I am still curious what is the reason for this discrepancy. Any logical explanation(s) ?

My understanding is that LHC@Home jobs award credit based on the task CPU time, but it's been a long time since I asked about it so I may be misremembering.
ID: 38760 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1491
Credit: 37,805,249
RAC: 42,062
Message 38761 - Posted: 10 May 2019, 10:51:36 UTC - in response to Message 38760.  

My understanding is that LHC@Home jobs award credit based on the task CPU time
I don't think so; here a few examples:

total time --- CPU time ----- points

CMS:
44,884.65 -- 36,639.44-- 423.30
46,714.42 -- 37,825.50-- 438.76

Theory:
47,088.19-- 45,886.61-- 1,849.16
44,955.45-- 43,741.28-- 1,739.19

so there seem to be other criterions in place.
ID: 38761 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1362
Credit: 41,170,196
RAC: 26,943
Message 38762 - Posted: 10 May 2019, 13:28:01 UTC

ID: 38762 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 699
Credit: 4,103,430
RAC: 484
Message 38763 - Posted: 10 May 2019, 14:28:22 UTC

I am getting huge amounts of credits in Milkyway@home and Asteroids@home in Science United. But I believe they are cumulative credits, that is belong to all users of those projects seen as a single user.
Tullio
ID: 38763 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1491
Credit: 37,805,249
RAC: 42,062
Message 38764 - Posted: 10 May 2019, 16:28:14 UTC - in response to Message 38762.  

A good answer from Tullio some time ago:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=3400&postid=23814#23814
Tullio said: "Credits are like money during an inflation.The more they are the less they are worth"

okay, so that's the secret behind :-)
ID: 38764 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 821
Credit: 5,717,880
RAC: 201
Message 38998 - Posted: 29 May 2019, 20:04:33 UTC
Last modified: 29 May 2019, 20:06:08 UTC

There's a potential problem brewing with our WMAgent: status: Agent Data is not updated: AgentStatusWatcher is Down
It doesn't appear to be affecting operations (yet...). The responsibles have been e-mailed but I don't expect a resolution tonight.
ID: 38998 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 821
Credit: 5,717,880
RAC: 201
Message 38999 - Posted: 29 May 2019, 21:08:09 UTC - in response to Message 38998.  

I'm starting to see some changes in the Dashboard graphs; this may be due to incomplete data being sent from WMAgent to Dashboard.
ID: 38999 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1491
Credit: 37,805,249
RAC: 42,062
Message 39001 - Posted: 30 May 2019, 5:45:55 UTC - in response to Message 38999.  

no new tasks available right now :-(
ID: 39001 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 821
Credit: 5,717,880
RAC: 201
Message 39003 - Posted: 30 May 2019, 8:45:43 UTC - in response to Message 39001.  

no new tasks available right now :-(

OK, the automatic task-killer seems to have kicked in as desired. I'm busy emailing everyone -- I wanted to take today off to do some gardening...
ID: 39003 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 821
Credit: 5,717,880
RAC: 201
Message 39004 - Posted: 30 May 2019, 10:26:23 UTC - in response to Message 39003.  

no new tasks available right now :-(

OK, the automatic task-killer seems to have kicked in as desired. I'm busy emailing everyone -- I wanted to take today off to do some gardening...

Oh, blast! It's a holiday today at CERN (Ascension), so not much chance of getting a response -- possibly not until Monday if everyone takes tomorrow off as well for a long weekend.
ID: 39004 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 821
Credit: 5,717,880
RAC: 201
Message 39023 - Posted: 2 Jun 2019, 20:18:18 UTC - in response to Message 39004.  

OK, people are trickling back from the holiday weekend. The WMAgent has been restarted, but I'm getting errors in tasks on my home PC. Probably best to defer re-starting tasks until tomorrow to let errors propagate out of the system. I'll keep an eye on it for the next hour or two before closing myself down for the night. https://www.youtube.com/watch?v=XQ6fbsFiwWQ
ID: 39023 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1865
Credit: 128,173,766
RAC: 81,015
Message 39024 - Posted: 2 Jun 2019, 21:10:05 UTC - in response to Message 39023.  

Thanks.

Got a fresh task. So far no issues.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=231035328
ID: 39024 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 821
Credit: 5,717,880
RAC: 201
Message 39025 - Posted: 2 Jun 2019, 21:34:51 UTC - in response to Message 39024.  
Last modified: 2 Jun 2019, 21:36:11 UTC

Thanks.

Got a fresh task. So far no issues.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=231035328

Great. I'm still having issues with my home PC but at least one work server picked up new tasks seamlessly. Hitting the sack now, see y'all tomorrow...
ID: 39025 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : CMS Application : CMS jobs are becoming available again


©2022 CERN