Message boards :
CMS Application :
CMS@Home difficulties in attempts to prepare for multi-core jobs
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next
Author | Message |
---|---|
Send message Joined: 7 Aug 11 Posts: 104 Credit: 24,785,073 RAC: 12,459 |
I'm only running my queue set to 1:1 At this rate I'll halve that. |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,460,759 RAC: 2,399 |
.... and I have tried complete clean reinstalls of everything and they will d/l the vdi and then the tasks just crash.....so I have to keep track of which Theory or CMS will run on them from here and -dev)I had the same a few days ago. See my posts in this thread. First I got the CMS_2022_09_07.vdi what was an exact copy of the dev- project with the same UUID causing the crashes. I resetted the dev-project and removed the vdi from Media Manager. Thereafter I had a valid result with CMS_2022_09_07.vdi. Suddenly a new task needed the prod-version of the vdi (CMS_2022_09_07_prod.vdi) , that wasn't there. I resetted the project again. I suppose the CMS-settings on the production server wasn't setup very well. |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 2,013 |
First: Deselect in prefs of LHCatHome all Projects excluding CMS. |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,817,946 RAC: 15,859 |
.... and I have tried complete clean reinstalls of everything and they will d/l the vdi and then the tasks just crash.....so I have to keep track of which Theory or CMS will run on them from here and -dev)I had the same a few days ago. See my posts in this thread. First I got the CMS_2022_09_07.vdi what was an exact copy of the dev- project with the same UUID causing the crashes. Yes that is the same thing for me and I thought you may see that since you were over there too CP Not a big deal to me since the whole point is so they run right for members here without problems. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,832,269 RAC: 37,309 |
Right. A BOINC client that is attached to -dev and -prod has to deal with up to 4 CMS vdi files: vdi-dev-old vdi-dev-new vdi-prod-old vdi-prod-new VirtualBox holds all virtual disks from the same user in 1 media registry which conflicts if they have the same name and/or the same UUID. Hint_1: VirtualBox adds the UUID to the vdi file. Hint_2: Using VBoxManage to clone the vdi file creates a new UUID while copying it does not To avoid those conflicts the process should be: Create a fresh vdi file for each app_version or use VBoxManage to clone the original one. That way the new set of vdi files should look like: vdi-dev-old -> CMS_2022_09_07.vdi (unchanged) vdi-dev-new -> CMS_<release date>_mt_dev.vdi (for future mt releases) vdi-prod-old -> CMS_2022_09_07_prod.vdi (unchanged) vdi-prod-new -> CMS_<release date>_mt_prod.vdi (for future mt releases) (it's currently CMS_2022_09_07.vdi) Hence, an updated vbox64_mt_mcore_cms app should be published. ATLAS follows these scheme but it somehow got lost for CMS. |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,817,946 RAC: 15,859 |
......as I said I just wish here that we could run a seperate vdi for CMS singles and CMS multi because I have watched the same host switch back and forth without asking it to do that or changing the settings CMS Simulation v70.20 (vbox64) windows_x86_64 to CMS Simulation v60.70 (vbox64_mt_mcore_cms) windows_x86_64 and at -dev that never happens and will just stay running what we ask it to as far as number of cores per task When I changed that host from 8-core multi here to a 4 core multi it instead just gives me a single core task when it is set to get 2 four core tasks (which is what I said in my post} it changed that vdi on its own to CMS Simulation v70.20 (vbox64) windows_x86_64 I just love typing in the dark at 2am but I still hate linux) |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,817,946 RAC: 15,859 |
Right. Hence, an updated vbox64_mt_mcore_cms app should be published. AH HA.....now see that is what I was talking about Steff....I didn't have that problem with the few Atlas I ran but CMS doesn't like to behave (maybe Windows too) OK that does look cleaner |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,832,269 RAC: 37,309 |
Magic Quantum Mechanic wrote: ((Lots of text)) ... Lots of text to comment a post that wasn't for you. Would have been better to type while light is switched on. |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 2,013 |
Steve like only OpenSuse. We talk about Windows here. CMS need a correct functionally TCP/IP connection. After 15 min. without connection, they are canceled. |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,817,946 RAC: 15,859 |
Magic Quantum Mechanic wrote:((Lots of text)) ... But then you mysteriously sort of answered my question and I am used to you even at github And I suppose I shouldn't have quoted you since you tend to........ |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,460,759 RAC: 2,399 |
I'm only getting single core at the moment. For me time to stop testing . . . |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,817,946 RAC: 15,859 |
I'm only getting single core at the moment. For me time to stop testing . . . Same here......Gute Nacht |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,832,269 RAC: 37,309 |
We talk about Windows here. For many years you don't get the point: CMS runs inside a Linux VM. That Linux VM is the very same on a Windows host, on a (any) Linux host and even on Apple. CMS need a correct functionally TCP/IP connection. Feel free to try to explain anything you want, but nobody forces you to explain things you obviously don't understand. |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 2,013 |
Steve, we seeing NO Win PC on your side. Yes, Windows running CMS. So, more respect for the Windows Side. |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,817,946 RAC: 15,859 |
We talk about Windows here. Everyone knows that and now you even talk to yourself.......only good thing is it magically made my microwave communications run full speed for some late night reason (another thing you faked like you were an expert at) we all know the reason we even started using Oracle VirtualBox back in 2011 was to work together with the CERN linux servers not just for fun and to repeatedly say VM or Linux every chance we get |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,460,759 RAC: 2,399 |
Couldn't resist and started a test on a laptop. The first task started a 4-core VM and used CMS_2022_09_07_prod.vdi as harddisk (CMS Simulation v70.20 (vbox64)). I requested a second task, got one, but a new CMS_2022_09_07.vdi was downloaded too (CMS Simulation v70.20 (vbox64_mt_mcore_cms)). I checked the info of this 'fresh' HD" with: vboxmanage.exe showhdinfo d:\boinc1\projects\lhcathome.cern.ch_lhcathome\CMS_2022_09_07.vdi The result: VBoxManage.exe: error: Cannot register the hard disk 'D:\boinc1\projects\lhcathome.cern.ch_lhcathome\CMS_2022_09_07.vdi' {8fb925ef-3497-4bfb-88e3-bbab2930787f} because a hard disk 'D:\Boinc1\projects\lhcathomedev.cern.ch_lhcathome-dev\CMS_2022_09_07.vdi' with UUID {8fb925ef-3497-4bfb-88e3-bbab2930787f} already exists So this task will crash when the UUID is not changed. For the moment I'll do that for me locally with: vboxmanage internalcommands sethduuid "D:/boinc1/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07.vdi" UUID changed to: 40a2c82d-7fc7-4ed0-a7c6-163a7d3df252 |
Send message Joined: 30 Mar 20 Posts: 4 Credit: 12,920,601 RAC: 9,809 |
Could the new multi-core app please get a different name? I use an app_config.xml file and having them both named CMS makes it useless. Thank you. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,832,269 RAC: 37,309 |
It has a different plan class. Create an extra section in your app_config.xml and use - <plan_class>vbox64</plan_class> for singlecore - <plan_class>vbox64_mt_mcore_cms</plan_class> for multicore See: https://lhcathome.cern.ch/lhcathome/apps.php https://boinc.berkeley.edu/wiki/Client_configuration#Project-level_configuration |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,832,269 RAC: 37,309 |
According to CERN Grafana CMS distributes new singlecore tasks since yesterday late afternoon UTC. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,832,269 RAC: 37,309 |
Couldn't resist and started a test on a laptop. This happens (usually after a BOINC restart) when the existing files are checked against their md5 hash. Since the new UUID is written to the vdi file the md5 hash now doesn't match the one sent by the project server. A new UUID works as long as you don't shut down BOINC. You could also use "<dont_check_file_sizes>1</dont_check_file_sizes>" (sic!) in cc_config.xml to bypass the md5 check, but this affects all projects this client is connected to and could have other (unwanted) side effects. The most reliable and permanent solution would be to get a correctly prepared vdi file from the project. |
©2024 CERN