Message boards :
CMS Application :
CMS@Home difficulties in attempts to prepare for multi-core jobs
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7
Author | Message |
---|---|
Send message Joined: 2 May 07 Posts: 2126 Credit: 159,977,413 RAC: 38,115 |
Run time 5 hours 58 min 38 sec Seeing the same. The CMS Tasks are more important. Boinc have this Systemerror (Is it an Error?) ever. |
Send message Joined: 7 Aug 11 Posts: 93 Credit: 21,875,393 RAC: 8,087 |
Run time 13 hours 22 min 46 sec CPU time 1 days 22 hours 56 min 57 sec Validate state Valid Credit 7.75 Excuse me CMS admin? A moment of your time? What the hell?? |
Send message Joined: 2 May 07 Posts: 2126 Credit: 159,977,413 RAC: 38,115 |
It's not the CMS-Team, for this Creditpoints. This is the Boinc-System. You can search in the folders, a lot of messages for it are present. |
Send message Joined: 7 Aug 11 Posts: 93 Credit: 21,875,393 RAC: 8,087 |
That's a cop-out. Every other project is awarding points correctly. If it were not CMS related then Atlas and Theory would be showing the same issues. If it were VBox related then other VBox projects would show the same issues. They don't. |
Send message Joined: 15 Jun 08 Posts: 2435 Credit: 228,297,758 RAC: 122,867 |
Every now and then somebody complains about low credits (weird, nobody ever complains about too much credits, lol). The answer is always the same: Credit calculation is built into BOINC. LHC@home does not change the relevant code parts as they also affect other things, e.g. work fetch calculation. To understand how it works, see: https://boinc.berkeley.edu/trac/wiki/CreditNew Change requests have to be made here: https://github.com/BOINC/boinc |
Send message Joined: 7 Aug 11 Posts: 93 Credit: 21,875,393 RAC: 8,087 |
So people are getting punished (host punishment, discussed on the linked page) for having to abort huge numbers of single core tasks that were never going to do actual work because the CMS team couldn't be arsed clearing out the work caches. People punished because you lot didn't do your jobs. And you're sneering down on people who don't appreciate being screwed around. Real nice. And the credit system used is a CHOICE made by each project. There are options. Bleating that the users need to complain elsewhere is another cop-out. |
Send message Joined: 15 Jun 08 Posts: 2435 Credit: 228,297,758 RAC: 122,867 |
Stop that kind of comments! You have been told the facts. Blaming people here for things you don't understand or accept is not respectful nor does it solve your complaint. |
Send message Joined: 14 Jan 10 Posts: 1294 Credit: 8,546,215 RAC: 3,497 |
Since this morning we're getting tasks of a new batch created by Ivan. For me it seems that these are single core jobs. At least the first job this morning in that Virtual Machine was using 4 threads and this afternoon the second job uses only 1 thread - cmsRun 100% |
Send message Joined: 2 May 07 Posts: 2126 Credit: 159,977,413 RAC: 38,115 |
2024-05-03 10:06:33 (21200): Setting CPU Count for VM. (4) For me 4-Core. .vdi differentiell 2 MByte used from 20 GByte. After half an hour: Running job output should appear here. No Job inside the Task seeing. Properties of Boinc-Task show this: Prozessorzeit 01:01:30 Prozessor-Zeit seit dem letzten Checkpoint 00:52:12 Seem to work. |
Send message Joined: 15 Jun 08 Posts: 2435 Credit: 228,297,758 RAC: 122,867 |
@Ivan Please explain what's going on. The last CMS batch before the WMAgent update was a 4-core batch. Lots of volunteer machines are now configured to run 4-core jobs. First batch after the upgrade is a 1-core batch but 4-core VMs get only 2 jobs per VM. This results in 2 idle cores per VM that can't be given back to BOINC for other work. |
Send message Joined: 15 Jun 08 Posts: 2435 Credit: 228,297,758 RAC: 122,867 |
Although BOINC shows valid tasks CERN Grafana shows 100% failure rate for the current CMS batch. |
Send message Joined: 29 Aug 05 Posts: 1009 Credit: 6,293,485 RAC: 1,481 |
Although BOINC shows valid tasks CERN Grafana shows 100% failure rate for the current CMS batch. Unfortunately, there has been an incompatability (maybe several) introduced with the WMAgent upgrade. We're trying to understand it/them so there may not be many jobs submitted until we get a handle on the changes. We've also been trying to understand whether it's possible to have single- and quad-core jobs in the queue simultaneously -- hence the number of small single-core workflows a couple of days ago. I think it's going to be hard to come to a consensus on this, but we are racking our brains... |
Send message Joined: 15 Jun 08 Posts: 2435 Credit: 228,297,758 RAC: 122,867 |
Although BOINC shows valid tasks CERN Grafana shows 100% failure rate for the current CMS batch. Still 100% failure rate. Even with the recent 4-core batch. |
©2024 CERN