Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,874,936
RAC: 14,598
Message 49989 - Posted: 21 Apr 2024, 7:51:45 UTC - in response to Message 49988.  

I'm only running my queue set to 1:1
At this rate I'll halve that.
ID: 49989 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1287
Credit: 8,516,022
RAC: 2,344
Message 49990 - Posted: 21 Apr 2024, 8:04:31 UTC - in response to Message 49984.  
Last modified: 21 Apr 2024, 8:06:06 UTC

.... and I have tried complete clean reinstalls of everything and they will d/l the vdi and then the tasks just crash.....so I have to keep track of which Theory or CMS will run on them from here and -dev)
I had the same a few days ago. See my posts in this thread. First I got the CMS_2022_09_07.vdi what was an exact copy of the dev- project with the same UUID causing the crashes.
I resetted the dev-project and removed the vdi from Media Manager. Thereafter I had a valid result with CMS_2022_09_07.vdi. Suddenly a new task needed the prod-version of the vdi (CMS_2022_09_07_prod.vdi) , that wasn't there. I resetted the project again.
I suppose the CMS-settings on the production server wasn't setup very well.
ID: 49990 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2121
Credit: 159,926,969
RAC: 70,085
Message 49991 - Posted: 21 Apr 2024, 8:05:39 UTC - in response to Message 49989.  
Last modified: 21 Apr 2024, 8:05:58 UTC

First: Deselect in prefs of LHCatHome all Projects excluding CMS.
ID: 49991 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1129
Credit: 49,767,714
RAC: 5,291
Message 49992 - Posted: 21 Apr 2024, 8:20:04 UTC - in response to Message 49990.  

.... and I have tried complete clean reinstalls of everything and they will d/l the vdi and then the tasks just crash.....so I have to keep track of which Theory or CMS will run on them from here and -dev)
I had the same a few days ago. See my posts in this thread. First I got the CMS_2022_09_07.vdi what was an exact copy of the dev- project with the same UUID causing the crashes.
I resetted the dev-project and removed the vdi from Media Manager. Thereafter I had a valid result with CMS_2022_09_07.vdi. Suddenly a new task needed the prod-version of the vdi (CMS_2022_09_07_prod.vdi) , that wasn't there. I resetted the project again.
I suppose the CMS-settings on the production server wasn't setup very well.


Yes that is the same thing for me and I thought you may see that since you were over there too CP
Not a big deal to me since the whole point is so they run right for members here without problems.
ID: 49992 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,525,460
RAC: 129,564
Message 49993 - Posted: 21 Apr 2024, 8:53:26 UTC - in response to Message 49990.  

Right.
A BOINC client that is attached to -dev and -prod has to deal with up to 4 CMS vdi files:
vdi-dev-old
vdi-dev-new
vdi-prod-old
vdi-prod-new

VirtualBox holds all virtual disks from the same user in 1 media registry which conflicts if they have the same name and/or the same UUID.
Hint_1: VirtualBox adds the UUID to the vdi file.
Hint_2: Using VBoxManage to clone the vdi file creates a new UUID while copying it does not

To avoid those conflicts the process should be:
Create a fresh vdi file for each app_version or use VBoxManage to clone the original one.

That way the new set of vdi files should look like:
vdi-dev-old -> CMS_2022_09_07.vdi (unchanged)
vdi-dev-new -> CMS_<release date>_mt_dev.vdi (for future mt releases)
vdi-prod-old -> CMS_2022_09_07_prod.vdi (unchanged)
vdi-prod-new -> CMS_<release date>_mt_prod.vdi (for future mt releases) (it's currently CMS_2022_09_07.vdi)

Hence, an updated vbox64_mt_mcore_cms app should be published.
ATLAS follows these scheme but it somehow got lost for CMS.
ID: 49993 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1129
Credit: 49,767,714
RAC: 5,291
Message 49994 - Posted: 21 Apr 2024, 8:56:05 UTC - in response to Message 49988.  
Last modified: 21 Apr 2024, 9:24:16 UTC

......as I said I just wish here that we could run a seperate vdi for CMS singles and CMS multi because I have watched the same host switch back and forth without asking it to do that or changing the settings
CMS Simulation v70.20 (vbox64)
windows_x86_64 to
CMS Simulation v60.70 (vbox64_mt_mcore_cms)
windows_x86_64
and at -dev that never happens and will just stay running what we ask it to as far as number of cores per task
When I changed that host from 8-core multi here to a 4 core multi it instead just gives me a single core task when it is set to get 2 four core tasks (which is what I said in my post} it changed that vdi on its own to CMS Simulation v70.20 (vbox64)
windows_x86_64

I just love typing in the dark at 2am but I still hate linux)
ID: 49994 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1129
Credit: 49,767,714
RAC: 5,291
Message 49995 - Posted: 21 Apr 2024, 8:58:44 UTC - in response to Message 49993.  
Last modified: 21 Apr 2024, 9:21:30 UTC

Right. Hence, an updated vbox64_mt_mcore_cms app should be published.
ATLAS follows these scheme but it somehow got lost for CMS.


AH HA.....now see that is what I was talking about Steff....I didn't have that problem with the few Atlas I ran but CMS doesn't like to behave (maybe Windows too)
OK that does look cleaner
ID: 49995 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,525,460
RAC: 129,564
Message 49996 - Posted: 21 Apr 2024, 9:05:48 UTC - in response to Message 49994.  

Magic Quantum Mechanic wrote:
((Lots of text)) ...
I just love typing in the dark at 2am)

Lots of text to comment a post that wasn't for you.
Would have been better to type while light is switched on.
ID: 49996 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2121
Credit: 159,926,969
RAC: 70,085
Message 49997 - Posted: 21 Apr 2024, 9:17:30 UTC - in response to Message 49995.  

Steve like only OpenSuse.
We talk about Windows here.
CMS need a correct functionally TCP/IP connection.
After 15 min. without connection, they are canceled.
ID: 49997 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1129
Credit: 49,767,714
RAC: 5,291
Message 49998 - Posted: 21 Apr 2024, 9:20:12 UTC - in response to Message 49996.  

Magic Quantum Mechanic wrote:
((Lots of text)) ...
I just love typing in the dark at 2am)

Lots of text to comment a post that wasn't for you.
Would have been better to type while light is switched on.


But then you mysteriously sort of answered my question and I am used to you even at github
And I suppose I shouldn't have quoted you since you tend to........
ID: 49998 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1287
Credit: 8,516,022
RAC: 2,344
Message 49999 - Posted: 21 Apr 2024, 9:20:34 UTC

I'm only getting single core at the moment. For me time to stop testing . . .
ID: 49999 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1129
Credit: 49,767,714
RAC: 5,291
Message 50000 - Posted: 21 Apr 2024, 9:25:00 UTC - in response to Message 49999.  
Last modified: 21 Apr 2024, 9:26:26 UTC

I'm only getting single core at the moment. For me time to stop testing . . .

Same here......Gute Nacht
ID: 50000 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,525,460
RAC: 129,564
Message 50001 - Posted: 21 Apr 2024, 9:36:21 UTC - in response to Message 49997.  
Last modified: 21 Apr 2024, 9:37:31 UTC

We talk about Windows here.

For many years you don't get the point:
CMS runs inside a Linux VM.
That Linux VM is the very same on a Windows host, on a (any) Linux host and even on Apple.


CMS need a correct functionally TCP/IP connection.
After 15 min. without connection, they are canceled.

Feel free to try to explain anything you want, but nobody forces you to explain things you obviously don't understand.
ID: 50001 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2121
Credit: 159,926,969
RAC: 70,085
Message 50002 - Posted: 21 Apr 2024, 9:58:39 UTC

Steve,
we seeing NO Win PC on your side.
Yes, Windows running CMS.
So, more respect for the Windows Side.
ID: 50002 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1129
Credit: 49,767,714
RAC: 5,291
Message 50003 - Posted: 21 Apr 2024, 10:12:19 UTC - in response to Message 50001.  
Last modified: 21 Apr 2024, 10:18:12 UTC

We talk about Windows here.

For many years you don't get the point:
CMS runs inside a Linux VM.
That Linux VM is the very same on a Windows host, on a (any) Linux host and even on Apple.


CMS need a correct functionally TCP/IP connection.
After 15 min. without connection, they are canceled.

Feel free to try to explain anything you want, but nobody forces you to explain things you obviously don't understand.

Everyone knows that and now you even talk to yourself.......only good thing is it magically made my microwave communications run full speed for some late night reason (another thing you faked like you were an expert at) we all know the reason we even started using Oracle VirtualBox back in 2011 was to work together with the CERN linux servers not just for fun and to repeatedly say VM or Linux every chance we get
ID: 50003 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1287
Credit: 8,516,022
RAC: 2,344
Message 50004 - Posted: 21 Apr 2024, 14:31:31 UTC
Last modified: 21 Apr 2024, 14:42:45 UTC

Couldn't resist and started a test on a laptop.

The first task started a 4-core VM and used CMS_2022_09_07_prod.vdi as harddisk (CMS Simulation v70.20 (vbox64)).
I requested a second task, got one, but a new CMS_2022_09_07.vdi was downloaded too (CMS Simulation v70.20 (vbox64_mt_mcore_cms)).
I checked the info of this 'fresh' HD" with: vboxmanage.exe showhdinfo d:\boinc1\projects\lhcathome.cern.ch_lhcathome\CMS_2022_09_07.vdi

The result:
VBoxManage.exe: error: Cannot register the hard disk 'D:\boinc1\projects\lhcathome.cern.ch_lhcathome\CMS_2022_09_07.vdi' {8fb925ef-3497-4bfb-88e3-bbab2930787f} because a hard disk 'D:\Boinc1\projects\lhcathomedev.cern.ch_lhcathome-dev\CMS_2022_09_07.vdi' with UUID {8fb925ef-3497-4bfb-88e3-bbab2930787f} already exists

So this task will crash when the UUID is not changed.
For the moment I'll do that for me locally with: vboxmanage internalcommands sethduuid "D:/boinc1/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07.vdi"

UUID changed to: 40a2c82d-7fc7-4ed0-a7c6-163a7d3df252
ID: 50004 · Report as offensive     Reply Quote
Ben

Send message
Joined: 30 Mar 20
Posts: 3
Credit: 9,299,792
RAC: 16,162
Message 50005 - Posted: 21 Apr 2024, 20:40:23 UTC
Last modified: 21 Apr 2024, 20:42:07 UTC

Could the new multi-core app please get a different name? I use an app_config.xml file and having them both named CMS makes it useless.

Thank you.
ID: 50005 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,525,460
RAC: 129,564
Message 50006 - Posted: 22 Apr 2024, 4:59:14 UTC - in response to Message 50005.  

It has a different plan class.

Create an extra section in your app_config.xml and use
- <plan_class>vbox64</plan_class> for singlecore
- <plan_class>vbox64_mt_mcore_cms</plan_class> for multicore

See:
https://lhcathome.cern.ch/lhcathome/apps.php
https://boinc.berkeley.edu/wiki/Client_configuration#Project-level_configuration
ID: 50006 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,525,460
RAC: 129,564
Message 50007 - Posted: 22 Apr 2024, 6:02:54 UTC

According to CERN Grafana CMS distributes new singlecore tasks since yesterday late afternoon UTC.
ID: 50007 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,525,460
RAC: 129,564
Message 50008 - Posted: 22 Apr 2024, 6:55:19 UTC - in response to Message 50004.  

Couldn't resist and started a test on a laptop.

The first task started a 4-core VM and used CMS_2022_09_07_prod.vdi as harddisk (CMS Simulation v70.20 (vbox64)).
I requested a second task, got one, but a new CMS_2022_09_07.vdi was downloaded too (CMS Simulation v70.20 (vbox64_mt_mcore_cms)).
I checked the info of this 'fresh' HD" with: vboxmanage.exe showhdinfo d:\boinc1\projects\lhcathome.cern.ch_lhcathome\CMS_2022_09_07.vdi

The result:
VBoxManage.exe: error: Cannot register the hard disk 'D:\boinc1\projects\lhcathome.cern.ch_lhcathome\CMS_2022_09_07.vdi' {8fb925ef-3497-4bfb-88e3-bbab2930787f} because a hard disk 'D:\Boinc1\projects\lhcathomedev.cern.ch_lhcathome-dev\CMS_2022_09_07.vdi' with UUID {8fb925ef-3497-4bfb-88e3-bbab2930787f} already exists

So this task will crash when the UUID is not changed.
For the moment I'll do that for me locally with: vboxmanage internalcommands sethduuid "D:/boinc1/projects/lhcathome.cern.ch_lhcathome/CMS_2022_09_07.vdi"

UUID changed to: 40a2c82d-7fc7-4ed0-a7c6-163a7d3df252

This happens (usually after a BOINC restart) when the existing files are checked against their md5 hash.
Since the new UUID is written to the vdi file the md5 hash now doesn't match the one sent by the project server.
A new UUID works as long as you don't shut down BOINC.

You could also use "<dont_check_file_sizes>1</dont_check_file_sizes>" (sic!) in cc_config.xml to bypass the md5 check, but this affects all projects this client is connected to and could have other (unwanted) side effects.

The most reliable and permanent solution would be to get a correctly prepared vdi file from the project.
ID: 50008 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs


©2024 CERN