Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

AuthorMessage
maeax

Send message
Joined: 2 May 07
Posts: 2244
Credit: 173,902,375
RAC: 456
Message 49816 - Posted: 22 Mar 2024, 16:01:20 UTC - in response to Message 49814.  
Last modified: 22 Mar 2024, 16:35:49 UTC

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3310245
1. CMS_2022_09_07.vdi and
2. CMS_2022_09_07_prod.vdi
in manager of Virtualbox shown.
Why two? Can we delete the first .vdi?
Both have 3.76 GByte.
ID: 49816 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1422
Credit: 9,484,585
RAC: 1,266
Message 49817 - Posted: 22 Mar 2024, 17:13:04 UTC - in response to Message 49816.  
Last modified: 22 Mar 2024, 17:13:50 UTC

Why two? Can we delete the first .vdi?
No. The *_prod belongs to LHC@home and the other vdi to -dev.
In principle they are equal, but each vdi needs an own UUID for VirtualBox to work properly.
ID: 49817 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1176
Credit: 54,887,670
RAC: 5,761
Message 49818 - Posted: 22 Mar 2024, 20:31:57 UTC - in response to Message 49817.  

Yes and I have both versions running right now.
ID: 49818 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1061
Credit: 7,737,455
RAC: 298
Message 49823 - Posted: 24 Mar 2024, 15:03:57 UTC

We may have a breakthrough. A previous workflow that was supposed to run on 4-core(+) machines didn't start because it was assigned to a wrong "team". Federica has recently submitted a w/f that calls for two cores -- my home machine (running a 4-core -dev VM) has picked up one of her jobs and is currently running on two cores Please let us know if you have a multicore -dev VM that is running jobs on more than one core. If my understanding is correct, single-core mainstream VMs won't run the two-core jobs but I have single-core jobs in the queues.
ID: 49823 · Report as offensive     Reply Quote
captainjack

Send message
Joined: 21 Jun 10
Posts: 41
Credit: 11,451,358
RAC: 4,851
Message 49824 - Posted: 24 Mar 2024, 15:49:44 UTC - in response to Message 49823.  

Yes, I have two CMS test tasks running right now that are running on two cores. The tasks names says that they are running on 4 cores but they are really running on 2 cores. Let me know if you have more questions.
ID: 49824 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,943,683
RAC: 21,125
Message 49825 - Posted: 24 Mar 2024, 15:57:25 UTC - in response to Message 49823.  

... but I have single-core jobs in the queues.
Ivan, either they are used up already, or something else is going wrong.
My hosts downloaded new tasks, but they don't work :-(
ID: 49825 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2541
Credit: 254,608,838
RAC: 34,609
Message 49826 - Posted: 24 Mar 2024, 15:57:53 UTC - in response to Message 49823.  

Got a 2-core CMS on a 4-core VM connected to -dev.

After a long setup phase (~18 min) with CPU usage between idle and 100 % (= 1 core) cmsRun switched to ~200 %.
This points out it uses 2 cores inside the VM.
Monitoring data on the host confirms this.


The long setup phase is not an error as
- the box runs another BOINC client running lots of Theory tasks
- the CMS task itself made lots of internet request to update CVMFS/Frontier data


Unfortunately except console 1 and console 3 (top) all other monitoring consoles at the VM do not work.
ID: 49826 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1422
Credit: 9,484,585
RAC: 1,266
Message 49827 - Posted: 24 Mar 2024, 16:52:21 UTC
Last modified: 24 Mar 2024, 17:10:54 UTC

My laptop is configured as a host with 8 cores and I set in project preferences no limit on CPUs and asking 1 task.
The CMS task takes all 8 cores and creates an 8-core VM (8168 MB Base Memory).I'm not using an app_config.xml.
All other running BOINC-tasks are getting the status waiting to run.
After an init-phase of 13 minutes cmsRun started and is using 200% CPU.
Is this cmsRun twice as fast as a single run or is it running two jobs in the background?
No Console outputs to check what's going on.
ID: 49827 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2541
Credit: 254,608,838
RAC: 34,609
Message 49828 - Posted: 24 Mar 2024, 17:42:14 UTC

It needs to be clarified whether

1. a workflow batch at the backend must be configured to run on n cores before any work is sent out

2. a task on a volunteer VM can forward it's own #cores to the CMS app and CMS uses this #cores.
Like:
2-core VM -> 2-core CMS
4-core VM -> 4-core CMS


Sending out something like fix n-core CMS tasks to a VM not running n cores makes no sense.
ID: 49828 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,943,683
RAC: 21,125
Message 49829 - Posted: 24 Mar 2024, 18:18:33 UTC - in response to Message 49825.  

... but I have single-core jobs in the queues.
Ivan, either they are used up already, or something else is going wrong.
My hosts downloaded new tasks, but they don't work :-(
Ivan, what's the current status on single-core jobs? Obviously, none available at this point :-(
ID: 49829 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1422
Credit: 9,484,585
RAC: 1,266
Message 49830 - Posted: 24 Mar 2024, 18:49:06 UTC - in response to Message 49827.  

myself wrote:
Is this cmsRun twice as fast as a single run or is it running two jobs in the background?.

A CMS-job running inside the VM is obviously running twice as fast.
Normally a job needs about 4 hours on my laptop depending on other BOINC-tasks. The BOINC-task is running now 2.75 hours and at least busy with the second cmsRun (65 minutes into that)
ID: 49830 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 118
Credit: 12,588,679
RAC: 899
Message 49831 - Posted: 24 Mar 2024, 19:01:44 UTC
Last modified: 24 Mar 2024, 19:04:17 UTC

This host has 4 cores.
Preferences set to run 1 job and 4 cores.
Three other jobs (non LHC) are "Waiting to run".
No app_config.
top (f3) shows cmsRun @ 200%.
CMS job running about twice as fast as usual.
ID: 49831 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1422
Credit: 9,484,585
RAC: 1,266
Message 49832 - Posted: 24 Mar 2024, 20:48:16 UTC

I changed my prefs to 1 task and max 2 CPUs.
The task created a dual core VM with 2792 MB memory.
After about 5 minutes a cmsRun appeared using up to 100% CPU and after another 2 minutes cmsRun started using up to 200% CPU.

First test task: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3310818
ID: 49832 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,943,683
RAC: 21,125
Message 49833 - Posted: 25 Mar 2024, 5:42:49 UTC - in response to Message 49832.  

I changed my prefs to 1 task and max 2 CPUs.
The task created a dual core VM with 2792 MB memory.
After about 5 minutes a cmsRun appeared using up to 100% CPU and after another 2 minutes cmsRun started using up to 200% CPU.

First test task: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3310818
I tried the same prefs which CP is describing above. However, without success - no dual core VM created :-(
see here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=408217329

what's going wrong?
ID: 49833 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2244
Credit: 173,902,375
RAC: 456
Message 49834 - Posted: 25 Mar 2024, 6:06:46 UTC - in response to Message 49833.  

what's going wrong?

There are no job's inside the task.
Seeing the same.
ID: 49834 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1422
Credit: 9,484,585
RAC: 1,266
Message 49835 - Posted: 25 Mar 2024, 6:37:09 UTC - in response to Message 49833.  
Last modified: 25 Mar 2024, 6:41:26 UTC

I changed my prefs to 1 task and max 2 CPUs.
The task created a dual core VM with 2792 MB memory.
After about 5 minutes a cmsRun appeared using up to 100% CPU and after another 2 minutes cmsRun started using up to 200% CPU.

First test task: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3310818
I tried the same prefs which CP is describing above. However, without success - no dual core VM created :-(
see here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=408217329

what's going wrong?

The multi core CMS is tested on the development system only, afaik ...
... and don't use app_config.xml for CMS.
ID: 49835 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,943,683
RAC: 21,125
Message 49836 - Posted: 25 Mar 2024, 10:34:43 UTC - in response to Message 49835.  

what's going wrong?
The multi core CMS is tested on the development system only, afaik ...
... and don't use app_config.xml for CMS.
no, I did not use app_config_xml; I made the setting for 2 cores in the web page. So the multicore tasks seem to be offered in the -dev system only, okay.

But even the usual 1-core tasks are (still) not working at this point, obviously no jobs are available. But why does the project status page then show the usual number of "unsent" tasks (close to 200) - could it be that this includes the test multicore-tasks from the -dev system? Or does the "automatic task distribution stop function in case of no jobs available" not work?

To me, everything seems to be a litte weird at the moment :-(
ID: 49836 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2541
Credit: 254,608,838
RAC: 34,609
Message 49837 - Posted: 25 Mar 2024, 13:35:39 UTC - in response to Message 49836.  

As always, the BOINC tasks are only envelopes created by a server script (or an independent backend system).
Prod and dev each run their own script, independent from the other one.

CMS jobs are created and administered by a rather complex backend workflow.
There is 1 active workflow feeding both, prod and dev.
To maintain separate workflows would be a huge effort and there are not enough volunteers on dev to guarantee a steady return rate.

Although the CMS vdi is the same for prod and dev (hence can be run as 1-core/n-core) the BOINC app plus the job startup scripts (partly hardwired) on prod configure the VM to accept only singlecore jobs.
On dev the BOINC app is a full multicore app (which can also run 1-core jobs).

ATM the workflow queue contains jobs that are configured to run on 2-core systems.
Hence, they can run on dev but fail on prod.


I'm sure Ivan and the CMS team are working on a solution to get out the parameters that are necessary to run stable multicore jobs.
Once this is done I expect the singlecore CMS on prod will be replaced by a multicore app.

Be patient.
Give them the time it needs.
ID: 49837 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2244
Credit: 173,902,375
RAC: 456
Message 49838 - Posted: 25 Mar 2024, 13:55:46 UTC - in response to Message 49837.  

ID: 49838 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1821
Credit: 118,943,683
RAC: 21,125
Message 49841 - Posted: 25 Mar 2024, 17:08:26 UTC - in response to Message 49837.  

thanks, computezrmle, for your thorough explanation :-)

When you say
"I expect the singlecore CMS on prod will be replaced by a multicore app"

I hope this will mean the same as for ATLAS, i.e. we volunteers can choose between 1 and n cores per task.
ID: 49841 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs


©2024 CERN