Message boards :
CMS Application :
CMS@Home difficulties in attempts to prepare for multi-core jobs
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next
Author | Message |
---|---|
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 456 |
https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3310245 1. CMS_2022_09_07.vdi and 2. CMS_2022_09_07_prod.vdi in manager of Virtualbox shown. Why two? Can we delete the first .vdi? Both have 3.76 GByte. |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,266 |
Why two? Can we delete the first .vdi?No. The *_prod belongs to LHC@home and the other vdi to -dev. In principle they are equal, but each vdi needs an own UUID for VirtualBox to work properly. |
Send message Joined: 24 Oct 04 Posts: 1176 Credit: 54,887,670 RAC: 5,761 |
Yes and I have both versions running right now. |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 298 |
We may have a breakthrough. A previous workflow that was supposed to run on 4-core(+) machines didn't start because it was assigned to a wrong "team". Federica has recently submitted a w/f that calls for two cores -- my home machine (running a 4-core -dev VM) has picked up one of her jobs and is currently running on two cores Please let us know if you have a multicore -dev VM that is running jobs on more than one core. If my understanding is correct, single-core mainstream VMs won't run the two-core jobs but I have single-core jobs in the queues. |
Send message Joined: 21 Jun 10 Posts: 41 Credit: 11,451,284 RAC: 5,419 |
Yes, I have two CMS test tasks running right now that are running on two cores. The tasks names says that they are running on 4 cores but they are really running on 2 cores. Let me know if you have more questions. |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,941,023 RAC: 22,106 |
... but I have single-core jobs in the queues.Ivan, either they are used up already, or something else is going wrong. My hosts downloaded new tasks, but they don't work :-( |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
Got a 2-core CMS on a 4-core VM connected to -dev. After a long setup phase (~18 min) with CPU usage between idle and 100 % (= 1 core) cmsRun switched to ~200 %. This points out it uses 2 cores inside the VM. Monitoring data on the host confirms this. The long setup phase is not an error as - the box runs another BOINC client running lots of Theory tasks - the CMS task itself made lots of internet request to update CVMFS/Frontier data Unfortunately except console 1 and console 3 (top) all other monitoring consoles at the VM do not work. |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,266 |
My laptop is configured as a host with 8 cores and I set in project preferences no limit on CPUs and asking 1 task. The CMS task takes all 8 cores and creates an 8-core VM (8168 MB Base Memory).I'm not using an app_config.xml. All other running BOINC-tasks are getting the status waiting to run. After an init-phase of 13 minutes cmsRun started and is using 200% CPU. Is this cmsRun twice as fast as a single run or is it running two jobs in the background? No Console outputs to check what's going on. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
It needs to be clarified whether 1. a workflow batch at the backend must be configured to run on n cores before any work is sent out 2. a task on a volunteer VM can forward it's own #cores to the CMS app and CMS uses this #cores. Like: 2-core VM -> 2-core CMS 4-core VM -> 4-core CMS Sending out something like fix n-core CMS tasks to a VM not running n cores makes no sense. |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,941,023 RAC: 22,106 |
Ivan, what's the current status on single-core jobs? Obviously, none available at this point :-(... but I have single-core jobs in the queues.Ivan, either they are used up already, or something else is going wrong. |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,266 |
myself wrote: Is this cmsRun twice as fast as a single run or is it running two jobs in the background?. A CMS-job running inside the VM is obviously running twice as fast. Normally a job needs about 4 hours on my laptop depending on other BOINC-tasks. The BOINC-task is running now 2.75 hours and at least busy with the second cmsRun (65 minutes into that) |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,588,679 RAC: 899 |
This host has 4 cores. Preferences set to run 1 job and 4 cores. Three other jobs (non LHC) are "Waiting to run". No app_config. top (f3) shows cmsRun @ 200%. CMS job running about twice as fast as usual. |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,266 |
I changed my prefs to 1 task and max 2 CPUs. The task created a dual core VM with 2792 MB memory. After about 5 minutes a cmsRun appeared using up to 100% CPU and after another 2 minutes cmsRun started using up to 200% CPU. First test task: https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3310818 |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,941,023 RAC: 22,106 |
I changed my prefs to 1 task and max 2 CPUs.I tried the same prefs which CP is describing above. However, without success - no dual core VM created :-( see here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=408217329 what's going wrong? |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 456 |
what's going wrong? There are no job's inside the task. Seeing the same. |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,266 |
I changed my prefs to 1 task and max 2 CPUs.I tried the same prefs which CP is describing above. However, without success - no dual core VM created :-( The multi core CMS is tested on the development system only, afaik ... ... and don't use app_config.xml for CMS. |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,941,023 RAC: 22,106 |
what's going wrong? The multi core CMS is tested on the development system only, afaik ...no, I did not use app_config_xml; I made the setting for 2 cores in the web page. So the multicore tasks seem to be offered in the -dev system only, okay. But even the usual 1-core tasks are (still) not working at this point, obviously no jobs are available. But why does the project status page then show the usual number of "unsent" tasks (close to 200) - could it be that this includes the test multicore-tasks from the -dev system? Or does the "automatic task distribution stop function in case of no jobs available" not work? To me, everything seems to be a litte weird at the moment :-( |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
As always, the BOINC tasks are only envelopes created by a server script (or an independent backend system). Prod and dev each run their own script, independent from the other one. CMS jobs are created and administered by a rather complex backend workflow. There is 1 active workflow feeding both, prod and dev. To maintain separate workflows would be a huge effort and there are not enough volunteers on dev to guarantee a steady return rate. Although the CMS vdi is the same for prod and dev (hence can be run as 1-core/n-core) the BOINC app plus the job startup scripts (partly hardwired) on prod configure the VM to accept only singlecore jobs. On dev the BOINC app is a full multicore app (which can also run 1-core jobs). ATM the workflow queue contains jobs that are configured to run on 2-core systems. Hence, they can run on dev but fail on prod. I'm sure Ivan and the CMS team are working on a solution to get out the parameters that are necessary to run stable multicore jobs. Once this is done I expect the singlecore CMS on prod will be replaced by a multicore app. Be patient. Give them the time it needs. |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 456 |
|
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,941,023 RAC: 22,106 |
thanks, computezrmle, for your thorough explanation :-) When you say "I expect the singlecore CMS on prod will be replaced by a multicore app" I hope this will mean the same as for ATLAS, i.e. we volunteers can choose between 1 and n cores per task. |
©2024 CERN