Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1009
Credit: 6,293,485
RAC: 1,481
Message 50025 - Posted: 23 Apr 2024, 18:58:45 UTC

OK, while we get our heads around various problems, we've decided just to send 4-core tasks for the rest of the week. This means you can suspend or set to NoNewTasks any machine that's not set up to run quad-core VMs.
To enable 4-core tasks, select the locale(s) (default, home, work or school) you want to run 4-core in and make sure the LHCathome preferences for that locale are set to run CMS with Max # CPUs set to 4. You can set Max # Jobs to whatever number your CPUs can run at 4-cores/job. Remember that multicore tasks will take proportionately more bandwidth, memory and other resources than single-core tasks. Check that the machines you want to run are truly set to the desired locale.
At present we have two workflows running. One is set to run 503,000 events/job (as was the template it was derived from) and takes about 5-6 hours wall-time. The other is set to 50,000 events/job and runs about one hour clock time. If we run out of jobs before the weekend, I'll submit a batch with 100,000 events/job, to match the 2-hour average our previous tasks took. These jobs generate considerably less output per CPU-hour than our previous ones.
I've noticed a few curious things with VirtualBox -- some people have not been running the VirtualBox extension pack, so make sure you are running the same version extension pack as your VirtualBox executable. I've also seen some errors activating the "multiattach" feature we use (where more than one VM can use the same virtual-disk image) with the claim being that it only works for vdis created with VirualBox greater than 4.0.
ID: 50025 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2435
Credit: 228,254,515
RAC: 122,368
Message 50026 - Posted: 23 Apr 2024, 20:21:53 UTC - in response to Message 50025.  

some people have not been running the VirtualBox extension pack

It is not a must to install the extension pack if you just want to run a headless VM.

I've also seen some errors activating the "multiattach" feature we use...

This is most likely solved with the upcoming new vboxwrapper version from github.
I'll inform Laurence as soon as it is approved and merged over there.
ID: 50026 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,875,393
RAC: 8,087
Message 50027 - Posted: 23 Apr 2024, 23:29:45 UTC

Cores per work unit set to four: check
Machine on correct profile: check
Guest Extensions correct version: check (VBox Version 7.0.16 r162802 (Qt5.15.3) )
Work fetch enabled: check
Abort all existing single core work units: check
Request fresh work: check
Check stderr for running CMS work unit: only requests single core and VM only allocates a single core, VBox Extension Pack recognised
Check VBox manager: all VMs show Extension pack available

Something isn't right.
ID: 50027 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,875,393
RAC: 8,087
Message 50028 - Posted: 23 Apr 2024, 23:34:40 UTC

Reset project: still not downloading the CMS multithread vdi
ID: 50028 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2125
Credit: 159,968,505
RAC: 38,628
Message 50029 - Posted: 24 Apr 2024, 10:07:43 UTC - in response to Message 50028.  
Last modified: 24 Apr 2024, 10:18:21 UTC

Running job output should appear here.......
Now 20 Minutes.https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3323207
Properites
Ressourcen
4 CPUs
Geschätzter Berechnungsaufwand
1.000.000 GFLOPs
Prozessorzeit
01:08:41
Prozessor-Zeit seit dem letzten Checkpoint
00:52:06
bisherige Laufzeit
00:33:36
Geschätzte verbleibende Zeit
07:52:54
Fortschritt
3,082%

It working!
ID: 50029 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,875,393
RAC: 8,087
Message 50030 - Posted: 24 Apr 2024, 13:06:38 UTC

2024-04-24 22:17:02 (1865718): Setting Memory Size for VM. (2048MB)
2024-04-24 22:17:02 (1865718): Setting CPU Count for VM. (1)

Still NOT working
https://lhcathome.cern.ch/lhcathome/result.php?resultid=410235598
ID: 50030 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2125
Credit: 159,968,505
RAC: 38,628
Message 50031 - Posted: 24 Apr 2024, 13:16:37 UTC

Without Proxy, the same?
ID: 50031 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1294
Credit: 8,546,168
RAC: 3,704
Message 50032 - Posted: 24 Apr 2024, 13:50:33 UTC - in response to Message 50030.  

Still NOT working
https://lhcathome.cern.ch/lhcathome/result.php?resultid=410235598
The task was running, but there are no single core CMS jobs at the moment: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6112&postid=50025
ID: 50032 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,875,393
RAC: 8,087
Message 50033 - Posted: 24 Apr 2024, 22:11:36 UTC - in response to Message 50031.  

Without Proxy, the same?

I'll try disabling the proxy, but the new vdi has a different name and isn't being requested as far as I can tell.
ID: 50033 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,875,393
RAC: 8,087
Message 50034 - Posted: 24 Apr 2024, 22:19:52 UTC

Turned off proxy in Boinc manager, reset project. Still only downloading the CMS_2022_09_07_prod.vdi and not the 70.20 (vbox64_mt_mcore_cms) one.
Turning proxy back on now.
ID: 50034 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 43
Credit: 5,489,724
RAC: 10,930
Message 50043 - Posted: 26 Apr 2024, 8:13:28 UTC - in response to Message 50034.  
Last modified: 26 Apr 2024, 8:13:41 UTC

Greetings,

Grabbed a 4 core multi, just now, to test my setup.

Showing in boinc as a multi-core.

Cheers
ID: 50043 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1693
Credit: 104,865,319
RAC: 69,451
Message 50044 - Posted: 26 Apr 2024, 14:19:20 UTC - in response to Message 50043.  

Grabbed a 4 core multi, just now, to test my setup.
you were lucky, there was obviously just a short time period around 8 a.m. when jobs were available.
From what I can see the task is still running
ID: 50044 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 43
Credit: 5,489,724
RAC: 10,930
Message 50046 - Posted: 27 Apr 2024, 4:52:58 UTC - in response to Message 50044.  

ID: 50046 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,875,393
RAC: 8,087
Message 50048 - Posted: 27 Apr 2024, 7:31:15 UTC

Some of us REALLY need the option to disable single core CMS work. I'm still getting nothing but the single core wrappers. Project reset does nothing but force me to download hefty vdi files over again and still gives me single core CMS. Multi threaded Atlas native works fine.
ID: 50048 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 43
Credit: 5,489,724
RAC: 10,930
Message 50049 - Posted: 27 Apr 2024, 7:34:26 UTC - in response to Message 50046.  

Greetings,

I came across something bit strange, well I was helping out a team member at the same time, but here goes,

My Windows 11 machine will download and run 4 core multicore CMS workunits.

My Linux Mint 21.3 machine will only download and run single core CMS workunits.

Am I missing something here though?

Regards
ID: 50049 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2125
Credit: 159,968,505
RAC: 38,628
Message 50050 - Posted: 27 Apr 2024, 8:21:43 UTC
Last modified: 27 Apr 2024, 8:26:10 UTC

Production:
Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU 70.20 (vbox64_mt_mcore_cms)
-dev
Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU 61.01 (vbox64_mt_mcore_cms)

This -dev Version is from yesterday.
Thinking Laurence and his Team (including Ivan) will see first how it work in -dev.
First task in -dev finished:
Computer ID 4639
Laufzeit 6 Stunden 59 min. 28 sek.
CPU Zeit 1 Tage 0 Stunden 58 min. 31 sek.
Prüfungsstatus Gültig
Punkte 1,101.22
ID: 50050 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,875,393
RAC: 8,087
Message 50054 - Posted: 27 Apr 2024, 11:20:33 UTC

I finally got some linux mt units.
Only ended up resetting the project several times and nuking eleven hundred plus empty single core units to finally get there.
ID: 50054 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1693
Credit: 104,865,319
RAC: 69,451
Message 50055 - Posted: 27 Apr 2024, 12:05:42 UTC - in response to Message 50050.  
Last modified: 27 Apr 2024, 12:06:20 UTC

maeax wrote:
...
Laufzeit 6 Stunden 59 min. 28 sek.
CPU Zeit 1 Tage 0 Stunden 58 min. 31 sek.
Prüfungsstatus Gültig
Punkte (=credit points): 1,101.22

Excerpt from the finished task from colleague tazzduke, a few postings above, this morning:

Laufzeit 14 Stunden 14 min. 48 sek.
CPU Zeit 1 Tage 22 Stunden 41 min. 55 sek.
Prüfungsstatus Gültig
Punkte (=credit points): 31.21
ID: 50055 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2125
Credit: 159,968,505
RAC: 38,628
Message 50056 - Posted: 27 Apr 2024, 12:25:06 UTC - in response to Message 50055.  

Credit points need to be working sometime with this new Program-Version.
ID: 50056 · Report as offensive     Reply Quote
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 93
Credit: 21,875,393
RAC: 8,087
Message 50057 - Posted: 27 Apr 2024, 20:48:47 UTC

Run time 5 hours 58 min 38 sec
CPU time 14 hours 40 min 18 sec
Validate state Valid
Credit 3.56

Run time 3 hours 9 min 48 sec
CPU time 5 hours 46 min 57 sec
Validate state Valid
Credit 1.88

Run time 5 hours 50 min 56 sec
CPU time 14 hours 38 min 39 sec
Validate state Valid
Credit 3.63

Excuse me, but what??
ID: 50057 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs


©2024 CERN