Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,874,936
RAC: 14,598
Message 49963 - Posted: 18 Apr 2024, 2:50:32 UTC

I'm still only getting single core work units at this stage though my profile is set for four cores (for Atlas jobs originally)
I have a few to get through so I'll just watch as see what pops up.
ID: 49963 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1286
Credit: 8,515,990
RAC: 2,442
Message 49965 - Posted: 18 Apr 2024, 9:33:14 UTC
Last modified: 18 Apr 2024, 9:34:27 UTC

I'll give this multi-core on production server a try.
First three tasks had an error cause the new downloaded CMS_2022_09_07.vdi had the same UUID as that one from the dev-system.
I resetted the dev-project on my PC and removed the hard disks from VirtualBox media.
I also removed my app_config.xml to see what is coming from the server without intervention.
I had set 1 task and no limit on CPUs in my project-preferences.
Now the task started OK and after a while started processing internal jobs.
A 24-core VM was created (no limit) and I see 2 processes cmsRun (each ~14% CPU) and 8 processes cmsExternalGene each consuming ~96% CPU.
ID: 49965 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 810
Credit: 654,488,485
RAC: 260,903
Message 49968 - Posted: 18 Apr 2024, 16:11:01 UTC

I see that one WU allocates 32 cores and then inside there is 6 processes cmsExternalGene using 1 core each.

What is the expected max number inside, as CP say seems like maybe 8?
ID: 49968 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1007
Credit: 6,279,340
RAC: 691
Message 49969 - Posted: 18 Apr 2024, 17:42:38 UTC - in response to Message 49968.  

I see that one WU allocates 32 cores and then inside there is 6 processes cmsExternalGene using 1 core each.

What is the expected max number inside, as CP say seems like maybe 8?

As far as I know, the tasks that run the new 4-core jobs should run on 4 cores no matter how many above that number you have allowed in your locale preferences. My experience is that the main process, cmsRun, spawns four threads, each running cmsExternalGenerator, so in your "top" display (Alt-F3) you should see four cmsExternalGenerator processes running at nearly 100% each, with the occasional appearance of the cmsRun master process as it gets its share of the resources..
ID: 49969 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 810
Credit: 654,488,485
RAC: 260,903
Message 49970 - Posted: 18 Apr 2024, 18:04:40 UTC - in response to Message 49969.  

OK, I lock it down to 4 cores and 4.5 GB of memory
ID: 49970 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1286
Credit: 8,515,990
RAC: 2,442
Message 49971 - Posted: 18 Apr 2024, 18:52:11 UTC - in response to Message 49969.  
Last modified: 18 Apr 2024, 18:56:36 UTC

My experience is that the main process, cmsRun, spawns four threads, each running cmsExternalGenerator, so in your "top" display (Alt-F3) you should see four cmsExternalGenerator processes running at nearly 100% each, with the occasional appearance of the cmsRun master process as it gets its share of the resources..
Did you read my post ? Especially the last sentence:
A 24-core VM was created (no limit) and I see 2 processes cmsRun (each ~14% CPU) and 8 processes cmsExternalGene each consuming ~96% CPU.

The 2 cmsRuns are constantly running using ~13-15% CPU, during a whole run of the 8 cmsExternalGenerator processes.
ID: 49971 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 810
Credit: 654,488,485
RAC: 260,903
Message 49972 - Posted: 19 Apr 2024, 5:39:21 UTC - in response to Message 49971.  

@CP yes, I'm not sure why mine had 6 processes. 8 at 100% seems like it would need 8 cores? which is what I set to initally, Ivan seemed to say that 4 was good.

I additionally see that each WU allocates 30 GB of working set so I have to think about how to get the sceduler to be OK.
ID: 49972 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1129
Credit: 49,767,714
RAC: 5,291
Message 49973 - Posted: 19 Apr 2024, 5:55:04 UTC
Last modified: 19 Apr 2024, 6:25:57 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=409980539

https://lhcathome.cern.ch/lhcathome/result.php?resultid=409923306

https://lhcathome.cern.ch/lhcathome/result.php?resultid=409860352
(all same host)

I keep trying a clean install of the CMS multi here on another host that is exactly the same as the one that works and it keeps giving me

Application
CMS Simulation 70.20 (vbox64)
Name
CMS_607434_1713506482.423690
State
Downloading
Received
4/18/2024 11:16:04 PM
Report deadline
5/18/2024 11:16:02 PM
Estimated computation size
1,000,000 GFLOPs
Executable
vboxwrapper_26206_windows_x86_64.exe
ID: 49973 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1286
Credit: 8,515,990
RAC: 2,442
Message 49974 - Posted: 19 Apr 2024, 7:46:33 UTC
Last modified: 19 Apr 2024, 8:26:29 UTC

For the non believers:



ID: 49974 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1129
Credit: 49,767,714
RAC: 5,291
Message 49975 - Posted: 19 Apr 2024, 10:21:00 UTC - in response to Message 49974.  

We have non believers CP ?
ID: 49975 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1691
Credit: 104,607,024
RAC: 101,191
Message 49976 - Posted: 19 Apr 2024, 16:09:39 UTC

no jobs for several hours, but the automatic stop of tasks distribution does not seem to work :-(
Thus causing thousands of useless tasks being uploaded after about half an hour runtime without results for the science :-(
ID: 49976 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 810
Credit: 654,488,485
RAC: 260,903
Message 49977 - Posted: 19 Apr 2024, 18:36:52 UTC - in response to Message 49974.  
Last modified: 20 Apr 2024, 7:30:23 UTC

I belive you, my observation was different. My question is since the WU's don't acually use 24 or 32 cores then what is a good number to correct the misconfiguration

ID: 49977 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1286
Credit: 8,515,990
RAC: 2,442
Message 49979 - Posted: 20 Apr 2024, 9:02:11 UTC - in response to Message 49977.  

@Toby:
Thanks for your image. I think you have 8 cmsExternalGenerator processes too. I've also sometimes seen less than 8, but always when other processes eating a lot of CPU like in your image cvmfs2.
I've seen cvmfs2's using up to 500% cpu. The job-processes are suppressed lower on the 'top' list under that circumstance.

Maybe we should set 4 to the number of CPUs in preferences and to be sure use app_config.xml.
ID: 49979 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1286
Credit: 8,515,990
RAC: 2,442
Message 49980 - Posted: 20 Apr 2024, 9:54:12 UTC

The single core tasks are running for half an hour and then stopped without having done something usefull:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=410051723
https://lhcathome.cern.ch/lhcathome/result.php?resultid=410055559
ID: 49980 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,476,965
RAC: 129,664
Message 49981 - Posted: 20 Apr 2024, 10:20:31 UTC - in response to Message 49980.  

ATM there are only 4-core jobs in the backend queue.
The singlecore backend queue is empty.
ID: 49981 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,874,936
RAC: 14,598
Message 49982 - Posted: 20 Apr 2024, 11:46:24 UTC

The single core back end that's cached now at CERN
Is completely gone they said
The single core back end that's cached now at CERN
Is completely gone ...
And still
They come!

<to the tune of The Eve of the War - Jeff Wayne's War of the Worlds>
ID: 49982 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 810
Credit: 654,488,485
RAC: 260,903
Message 49983 - Posted: 20 Apr 2024, 13:57:53 UTC - in response to Message 49979.  

Make sense, maybe the other processes are getting the next batch of work, then it will go to 8. I set to 8 cores, seems to load up the CPU OK.
ID: 49983 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1129
Credit: 49,767,714
RAC: 5,291
Message 49984 - Posted: 21 Apr 2024, 0:04:39 UTC
Last modified: 21 Apr 2024, 0:06:09 UTC

It sure would be nice if single core and multi-core were made separate from each other in the settings
I had one set to run 8 cores and it does this https://lhcathome.cern.ch/lhcathome/result.php?resultid=409980514

And if I try 4 cores it switches back to CMS Simulation v70.20 (vbox64)
windows_x86_64
(again just now)

And then over at -dev they run what I want them to run with CMS Simulation v60.70 (vbox64_mt_mcore_cms)
windows_x86_64

(and another problem is I have three matching 8-core hosts and some will run here and not at -dev and the exact opposite too and I have tried complete clean reinstalls of everything and they will d/l the vdi and then the tasks just crash.....so I have to keep track of which Theory or CMS will run on them from here and -dev)
ID: 49984 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,874,936
RAC: 14,598
Message 49987 - Posted: 21 Apr 2024, 6:16:24 UTC

Reset the project, made sure it's set to use 4 cores, Atlas native is running ok on four cores (been playing with HDDs after I had a failure so there's some errored and aborted tasks in my records), Theory is as reliable as ever <sarcasm>, but CMS just won't grab any of the multi-core work but keeps getting single core jobs that supposedly aren't even in the queue.
ID: 49987 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,476,965
RAC: 129,664
Message 49988 - Posted: 21 Apr 2024, 7:09:57 UTC - in response to Message 49987.  

This is (as of now) your most recently returned CMS task.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=410078311

The VM was a 1-core VM:
2024-04-21 16:17:27 (3178473): Setting CPU Count for VM. (1)


The task ran the envelope but didn't get a CMS job since the 1-core job queue is still dry.
Be aware that the envelope queue and the job queue are different.
The latter is much deeper in the process and has no direct connection to BOINC.

A good indicator is to compare runtime with CPU time.
Here: 33 min 40 sec vs. 2 min 9 sec
This means the VM tried a couple of times without success to get a job and finally gave up.

Since the short runtimes confuse BOINC's work fetch algorithm you will now get (in connection with a large work buffer) far too many CMS envelopes.
Once the job queue starts again to send jobs this may lead to a situation where your computer can't return all envelopes before the deadline.
Hence, keep your work buffer as small as possible.
ID: 49988 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs


©2024 CERN