Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7

AuthorMessage
maeax

Send message
Joined: 2 May 07
Posts: 2126
Credit: 159,977,413
RAC: 38,115
Message 50061 - Posted: 28 Apr 2024, 5:52:23 UTC - in response to Message 50057.  

Run time 5 hours 58 min 38 sec
CPU time 14 hours 40 min 18 sec
Validate state Valid
Credit 3.56

Run time 3 hours 9 min 48 sec
CPU time 5 hours 46 min 57 sec
Validate state Valid
Credit 1.88

Run time 5 hours 50 min 56 sec
CPU time 14 hours 38 min 39 sec
Validate state Valid
Credit 3.63

Excuse me, but what??

Seeing the same. The CMS Tasks are more important.
Boinc have this Systemerror (Is it an Error?) ever.
ID: 50061 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,875,393
RAC: 8,087
Message 50063 - Posted: 28 Apr 2024, 6:24:55 UTC

Run time 13 hours 22 min 46 sec
CPU time 1 days 22 hours 56 min 57 sec
Validate state Valid
Credit 7.75

Excuse me CMS admin? A moment of your time?

What the hell??
ID: 50063 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2126
Credit: 159,977,413
RAC: 38,115
Message 50065 - Posted: 28 Apr 2024, 6:35:10 UTC - in response to Message 50063.  

It's not the CMS-Team, for this Creditpoints.
This is the Boinc-System. You can search in the folders,
a lot of messages for it are present.
ID: 50065 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,875,393
RAC: 8,087
Message 50067 - Posted: 28 Apr 2024, 7:05:46 UTC - in response to Message 50065.  

That's a cop-out. Every other project is awarding points correctly. If it were not CMS related then Atlas and Theory would be showing the same issues. If it were VBox related then other VBox projects would show the same issues. They don't.
ID: 50067 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2435
Credit: 228,297,758
RAC: 122,867
Message 50068 - Posted: 28 Apr 2024, 7:40:25 UTC - in response to Message 50067.  

Every now and then somebody complains about low credits (weird, nobody ever complains about too much credits, lol).

The answer is always the same:
Credit calculation is built into BOINC.
LHC@home does not change the relevant code parts as they also affect other things, e.g. work fetch calculation.

To understand how it works, see:
https://boinc.berkeley.edu/trac/wiki/CreditNew

Change requests have to be made here:
https://github.com/BOINC/boinc
ID: 50068 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,875,393
RAC: 8,087
Message 50069 - Posted: 28 Apr 2024, 8:49:23 UTC
Last modified: 28 Apr 2024, 8:50:25 UTC

So people are getting punished (host punishment, discussed on the linked page) for having to abort huge numbers of single core tasks that were never going to do actual work because the CMS team couldn't be arsed clearing out the work caches.
People punished because you lot didn't do your jobs.
And you're sneering down on people who don't appreciate being screwed around.
Real nice.

And the credit system used is a CHOICE made by each project. There are options. Bleating that the users need to complain elsewhere is another cop-out.
ID: 50069 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2435
Credit: 228,297,758
RAC: 122,867
Message 50070 - Posted: 28 Apr 2024, 9:21:57 UTC - in response to Message 50069.  

Stop that kind of comments!

You have been told the facts.
Blaming people here for things you don't understand or accept is not respectful nor does it solve your complaint.
ID: 50070 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1294
Credit: 8,546,215
RAC: 3,497
Message 50105 - Posted: 2 May 2024, 12:29:15 UTC - in response to Message 50025.  

Since this morning we're getting tasks of a new batch created by Ivan.
For me it seems that these are single core jobs.
At least the first job this morning in that Virtual Machine was using 4 threads and this afternoon the second job uses only 1 thread - cmsRun 100%
ID: 50105 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2126
Credit: 159,977,413
RAC: 38,115
Message 50107 - Posted: 3 May 2024, 8:16:23 UTC - in response to Message 50105.  
Last modified: 3 May 2024, 8:41:33 UTC

2024-05-03 10:06:33 (21200): Setting CPU Count for VM. (4)
For me 4-Core.
.vdi differentiell 2 MByte used from 20 GByte.
After half an hour:
Running job output should appear here.
No Job inside the Task seeing.
Properties of Boinc-Task show this:
Prozessorzeit
01:01:30
Prozessor-Zeit seit dem letzten Checkpoint
00:52:12
Seem to work.
ID: 50107 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2435
Credit: 228,297,758
RAC: 122,867
Message 50136 - Posted: 6 May 2024, 17:51:19 UTC

@Ivan
Please explain what's going on.

The last CMS batch before the WMAgent update was a 4-core batch.
Lots of volunteer machines are now configured to run 4-core jobs.

First batch after the upgrade is a 1-core batch but 4-core VMs get only 2 jobs per VM.
This results in 2 idle cores per VM that can't be given back to BOINC for other work.
ID: 50136 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2435
Credit: 228,297,758
RAC: 122,867
Message 50139 - Posted: 6 May 2024, 18:47:48 UTC

Although BOINC shows valid tasks CERN Grafana shows 100% failure rate for the current CMS batch.
ID: 50139 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1009
Credit: 6,293,485
RAC: 1,481
Message 50142 - Posted: 7 May 2024, 13:30:44 UTC - in response to Message 50139.  

Although BOINC shows valid tasks CERN Grafana shows 100% failure rate for the current CMS batch.

Unfortunately, there has been an incompatability (maybe several) introduced with the WMAgent upgrade. We're trying to understand it/them so there may not be many jobs submitted until we get a handle on the changes.
We've also been trying to understand whether it's possible to have single- and quad-core jobs in the queue simultaneously -- hence the number of small single-core workflows a couple of days ago. I think it's going to be hard to come to a consensus on this, but we are racking our brains...
ID: 50142 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2435
Credit: 228,297,758
RAC: 122,867
Message 50151 - Posted: 9 May 2024, 7:53:54 UTC - in response to Message 50139.  

Although BOINC shows valid tasks CERN Grafana shows 100% failure rate for the current CMS batch.

Still 100% failure rate.
Even with the recent 4-core batch.
ID: 50151 · Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7

Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs


©2024 CERN