Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1007
Credit: 6,277,924
RAC: 669
Message 49842 - Posted: 25 Mar 2024, 17:17:20 UTC - in response to Message 49837.  

Yes, we're working on it, but it takes time. Currently we have some two-core and some 4-core jobs in the queue. These will only run in -dev. Let us know how you get on. I'll put some single-core jobs up as well, so people not in -dev can get some work too.
ID: 49842 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1007
Credit: 6,277,924
RAC: 669
Message 49845 - Posted: 25 Mar 2024, 21:10:21 UTC - in response to Message 49842.  

Yes, we're working on it, but it takes time. Currently we have some two-core and some 4-core jobs in the queue. These will only run in -dev. Let us know how you get on. I'll put some single-core jobs up as well, so people not in -dev can get some work too.

Hmm, there is a little problem with that -- The workflow manager is holding that batch in acquired status while the 2- and 4-core batches run and probably won't start it running until those queues start running dry, which could take some time!
ID: 49845 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1007
Credit: 6,277,924
RAC: 669
Message 49851 - Posted: 27 Mar 2024, 15:53:54 UTC - in response to Message 49845.  

Yes, we're working on it, but it takes time. Currently we have some two-core and some 4-core jobs in the queue. These will only run in -dev. Let us know how you get on. I'll put some single-core jobs up as well, so people not in -dev can get some work too.

Hmm, there is a little problem with that -- The workflow manager is holding that batch in acquired status while the 2- and 4-core batches run and probably won't start it running until those queues start running dry, which could take some time!

We still don't have any 1-core jobs available for CMS@Home "production". Federica is going to kill her 2-core workflow, so then hopefully the workflow-manager will notice that there aren't many jobs in the queue, and will move my workflow into the "running" state.
ID: 49851 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1007
Credit: 6,277,924
RAC: 669
Message 49852 - Posted: 27 Mar 2024, 19:10:15 UTC - in response to Message 49851.  
Last modified: 27 Mar 2024, 19:12:53 UTC

Yes, we're working on it, but it takes time. Currently we have some two-core and some 4-core jobs in the queue. These will only run in -dev. Let us know how you get on. I'll put some single-core jobs up as well, so people not in -dev can get some work too.

Hmm, there is a little problem with that -- The workflow manager is holding that batch in acquired status while the 2- and 4-core batches run and probably won't start it running until those queues start running dry, which could take some time!

We still don't have any 1-core jobs available for CMS@Home "production". Federica is going to kill her 2-core workflow, so then hopefully the workflow-manager will notice that there aren't many jobs in the queue, and will move my workflow into the "running" state.

OK, aborting the rather large two-core workflow has allowed my latest single-core batch to get its foot in the door. Currently we are running a 4-core w/f, accessible only to users of CMS@Home-dev who have set their MaxCPUsPerTask to >=4, and a single-core w/f that will run on "production" CMS@Home VMs, and at reduced efficiency (for N>1) for -dev Volunteers running N-core VMs
I'll try to keep the mix tuned over the holidays. We plan to allow multicore on production tasks after the break, I'll let you know when you can start experimenting with the number of cores as that happens. I think we'll concentrate mainly on 4-core, as that's where CMS seems to be focussing its efforts.
ID: 49852 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1007
Credit: 6,277,924
RAC: 669
Message 49854 - Posted: 28 Mar 2024, 21:08:07 UTC - in response to Message 49828.  

It needs to be clarified whether

1. a workflow batch at the backend must be configured to run on n cores before any work is sent out

2. a task on a volunteer VM can forward it's own #cores to the CMS app and CMS uses this #cores.
Like:
2-core VM -> 2-core CMS
4-core VM -> 4-core CMS


Sending out something like fix n-core CMS tasks to a VM not running n cores makes no sense.

Tja, the permutations become exponentially weird. Currently
o "Normal" single core jobs are available. These will run on "standard" LHC@Home machines, which cannot specify multicore VMs, and on LHC@Home-dev machines which specify NCPUs >=1 -- only using one core in the VM of course.
o A workflow specifying 4-core jobs is also available. These can run on LHC@Home-dev machines specifying NCPUs >=4.

I don't think we can specify workflows to "run on however many cores are available". The relevant parameter in the .json config file is "Multicore", which as far as I know takes an integer parameter at submission time.
ID: 49854 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,874,936
RAC: 17,795
Message 49856 - Posted: 29 Mar 2024, 0:49:33 UTC

So do the multicore work units actually get 4 times the work done using four cores?
I'd check myself but when I asked to join -dev I was told I wasn't required.
ID: 49856 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1007
Credit: 6,277,924
RAC: 669
Message 49877 - Posted: 2 Apr 2024, 10:22:51 UTC - in response to Message 49856.  

So do the multicore work units actually get 4 times the work done using four cores?
I'd check myself but when I asked to join -dev I was told I wasn't required.

Sorry about that, the decision was made by LHC@Home management.
Yes, since our jobs are mostly "embarrassingly parallel" the trend is to run multithreaded jobs so that each core runs over events individually. There are memory savings because of all the shared resources which only need to be loaded once.
ID: 49877 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,874,936
RAC: 17,795
Message 49878 - Posted: 2 Apr 2024, 10:30:56 UTC - in response to Message 49877.  

Yes, I am *clearly* a trouble maker after all. Utterly incorrigible. ;)

Thanks for that, I'm looking for ward to seeing how these run in production.
ID: 49878 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1007
Credit: 6,277,924
RAC: 669
Message 49880 - Posted: 3 Apr 2024, 0:05:00 UTC - in response to Message 49878.  
Last modified: 3 Apr 2024, 0:06:53 UTC

Yes, I am *clearly* a trouble maker after all. Utterly incorrigible. ;)
I'll take your word for it...

Thanks for that, I'm looking for ward to seeing how these run in production.

You can see my poor underpowered machine's tasks here.
I think Laurence still has some holidays in his pocket, so we may not turn on multicore in production until next week.
ID: 49880 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,874,936
RAC: 17,795
Message 49881 - Posted: 3 Apr 2024, 1:02:08 UTC - in response to Message 49880.  

Yes, I am *clearly* a trouble maker after all. Utterly incorrigible. ;)
I'll take your word for it...

Thanks for that, I'm looking for ward to seeing how these run in production.

You can see my poor underpowered machine's tasks here.
I think Laurence still has some holidays in his pocket, so we may not turn on multicore in production until next week.


Well ... I could if I had an account on the -dev project. Thanks anyway.
ID: 49881 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1691
Credit: 104,583,689
RAC: 118,008
Message 49892 - Posted: 4 Apr 2024, 7:23:58 UTC

the queue ran dry :-(
ID: 49892 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,874,936
RAC: 17,795
Message 49906 - Posted: 8 Apr 2024, 1:52:12 UTC

Any word on release to production yet?
ID: 49906 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1007
Credit: 6,277,924
RAC: 669
Message 49952 - Posted: 16 Apr 2024, 16:22:30 UTC - in response to Message 49906.  

Any word on release to production yet?

Yes, in fact. We activated multi-core in production this afternoon, and there are some 4-core jobs queued up ready to run. You can try setting your preferences for your favourite CMS@Home locale to using 4-core VMs, and see if you pick up a task. There will still be single-core jobs hanging around, so the current choices are just single- or quad-core tasks. Note that we haven't tuned the 4-core jobs yet, so you might run into bandwidth, memory or time-out problems.
ID: 49952 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2421
Credit: 227,253,743
RAC: 130,208
Message 49953 - Posted: 16 Apr 2024, 20:39:21 UTC - in response to Message 49952.  

If the #cores must be configured at batch creation time, then please make a decision.

Either keep only the singlecore app
or drop the singlecore app and send out a multicore with a fix #cores that is in sync with the backend.

Do not mix batches having different core settings as this would break BOINC's work fetch, runtime estimation, credit system ...
ID: 49953 · Report as offensive     Reply Quote
Ben

Send message
Joined: 30 Mar 20
Posts: 3
Credit: 9,272,353
RAC: 16,862
Message 49955 - Posted: 16 Apr 2024, 20:59:26 UTC - in response to Message 49952.  

I have one of the four core processes running now but there is a bug. It is using four cores but claims to be using five cores, so one core isn't being used by BOINC at all. I have five set as my maximum in my preferences, which I can change but this should be fixed.

Thank you.
ID: 49955 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1007
Credit: 6,277,924
RAC: 669
Message 49956 - Posted: 16 Apr 2024, 23:02:39 UTC - in response to Message 49953.  

If the #cores must be configured at batch creation time, then please make a decision.

Either keep only the singlecore app
or drop the singlecore app and send out a multicore with a fix #cores that is in sync with the backend.

Do not mix batches having different core settings as this would break BOINC's work fetch, runtime estimation, credit system ...

A fair point, I'll keep it in mind.
ID: 49956 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1007
Credit: 6,277,924
RAC: 669
Message 49957 - Posted: 16 Apr 2024, 23:09:05 UTC

I'm wondering if we got the multi-core config right on the production CMS@Home. Of the machines I'm running in a locale where I've selected 4-CPU tasks, all are running just a single-core VM. A machine running a 4-core locale at CMS@Home-dev has started a 4-core VM, though.
ID: 49957 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 99
Credit: 3,273,395
RAC: 3,436
Message 49958 - Posted: 17 Apr 2024, 1:14:44 UTC - in response to Message 49957.  

When you look at your computers tasks on the website, are they vbox64 or vbox64_mt_mcore_cms? Ben's computer was showing one off each outstanding / in progress
ID: 49958 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 378
Credit: 238,712
RAC: 0
Message 49960 - Posted: 17 Apr 2024, 13:57:30 UTC - in response to Message 49958.  

vbox64 and vbox64_mt_mcore_cms should be identical when the number of threads/cpus for vbox64_mt_mcore_cms is equal to 1. We can probably deprecate vbox64 if there are no issues with vbox64_mt_mcore_cms
ID: 49960 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1007
Credit: 6,277,924
RAC: 669
Message 49961 - Posted: 17 Apr 2024, 15:09:12 UTC - in response to Message 49958.  

When you look at your computers tasks on the website, are they vbox64 or vbox64_mt_mcore_cms? Ben's computer was showing one off each outstanding / in progress

Just vbox64, I'm afraid.
ID: 49961 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs


©2024 CERN