Message boards : Theory Application : Theory Tasks on various hosts failing since last night
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1473
Credit: 9,932,820
RAC: 813
Message 50759 - Posted: 9 Oct 2024, 9:18:26 UTC - in response to Message 50758.  

I was not even aware that Theory now again can run on more than 1 core; I think this was the case long time ago (if I remember right, but I might be mistaken), and then Theory was switched back to 1 core only.
Here you see a Theory task that ran on 2 cores: https://lhcathome.cern.ch/lhcathome/result.php?resultid=414709237
Run time 3 hours 37 min 42 sec
CPU time 4 hours 8 min 22 sec
No big advantage and BOINC very often don't report the right CPU-time used, but equals elapsed and cpu-time.
In the result:
2024-10-04 03:43:06 (1376): Setting Memory Size for VM. (768MB)
2024-10-04 03:43:07 (1376): Setting CPU Count for VM. (2)
ID: 50759 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2715
Credit: 294,036,377
RAC: 154,103
Message 50760 - Posted: 9 Oct 2024, 11:22:41 UTC - in response to Message 50757.  

Please check at console 2 if the VM uses lots of RAM/swap.
If so, you may try a higher RAM value via app_config.xml.
(Console 3:) I already had set 768 MB for the Theory VMs. A running Herwig7 shows within the VM:
KiB Mem: 744976 total, 65940 free, 387080 used 291020 buff/cache
KiB Swap: 1048572 total, 871920 free. 176652 used. 203840 avail Mem

This doesn't look too bad.
A bit more (e.g. up to a total of 1 GB) may result in a greater cache (good for CVMFS) but as long as the used swap remains stable don't expect a significant increase in performance.
ID: 50760 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2715
Credit: 294,036,377
RAC: 154,103
Message 50761 - Posted: 9 Oct 2024, 11:27:52 UTC - in response to Message 50759.  

I was not even aware that Theory now again can run on more than 1 core; I think this was the case long time ago (if I remember right, but I might be mistaken), and then Theory was switched back to 1 core only.
Here you see a Theory task that ran on 2 cores: https://lhcathome.cern.ch/lhcathome/result.php?resultid=414709237
Run time 3 hours 37 min 42 sec
CPU time 4 hours 8 min 22 sec
No big advantage and BOINC very often don't report the right CPU-time used, but equals elapsed and cpu-time.
In the result:
2024-10-04 03:43:06 (1376): Setting Memory Size for VM. (768MB)
2024-10-04 03:43:07 (1376): Setting CPU Count for VM. (2)

Theory is still a 1-core app at the project server (prod).
Hence, the VMs should also be configured as 1-core VMs.
Otherwise you may get unwanted side effects like the one already mentioned or even a performance decrease.
Yes, VirtualBox needs much more cycles to control multicore VMs than to control 1-core VMs.
ID: 50761 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1473
Credit: 9,932,820
RAC: 813
Message 50763 - Posted: 9 Oct 2024, 13:45:00 UTC

===> [runRivet] Wed Oct 9 07:34:01 UTC 2024 [boinc pp z1j 8000 30 - herwig7 7.2.1 nlo 100000 6]

EXIT_DISK_LIMIT_EXCEEDED ==> https://lhcathome.cern.ch/lhcathome/result.php?resultid=414826538

From BOINC's event log: LHC@home 09 Oct 15:22:51 Aborting task Theory_2794-3267759-6_2: exceeded disk limit: 7653.58MB > 7629.39MB
ID: 50763 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1924
Credit: 151,330,566
RAC: 144,990
Message 50764 - Posted: 9 Oct 2024, 14:50:37 UTC - in response to Message 50763.  

...
EXIT_DISK_LIMIT_EXCEEDED ==> https://lhcathome.cern.ch/lhcathome/result.php?resultid=414826538
as I mentioned earlier today: Herwig7 should not be sent out - they seem to have some problems
ID: 50764 · Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Aug 14
Posts: 27
Credit: 10,000,955
RAC: 0
Message 50765 - Posted: 9 Oct 2024, 15:21:02 UTC - in response to Message 50764.  

...
EXIT_DISK_LIMIT_EXCEEDED ==> https://lhcathome.cern.ch/lhcathome/result.php?resultid=414826538
as I mentioned earlier today: Herwig7 should not be sent out - they seem to have some problems
A tad long running but they seem okay on Native.
ID: 50765 · Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 15 Jul 05
Posts: 27
Credit: 2,675,621
RAC: 958
Message 50800 - Posted: 15 Oct 2024, 13:16:42 UTC
Last modified: 15 Oct 2024, 13:18:13 UTC

count me in
exceeded disk limit: 7641.76MB > 7629.39MB
https://lhcathome.cern.ch/lhcathome/result.php?resultid=414871107
Name: Theory_2794-3259275-214_0
Runtime 4 Tage 1 Stunden 38 min. 25 sek.
CPU time 3 Tage 17 Stunden 36 min. 11 sek.
Matthias

ID: 50800 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2278
Credit: 178,775,457
RAC: 575
Message 50801 - Posted: 15 Oct 2024, 14:18:35 UTC - in response to Message 50800.  
Last modified: 15 Oct 2024, 14:20:35 UTC

Boincmanager Diskdefinition. Have 125 GByte as default and yours?
15.10.2024 16:08:46 | | - max disk usage: 125.00 GB
ID: 50801 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1473
Credit: 9,932,820
RAC: 813
Message 50805 - Posted: 15 Oct 2024, 16:14:51 UTC - in response to Message 50801.  

Boincmanager Diskdefinition. Have 125 GByte as default and yours?
15.10.2024 16:08:46 | | - max disk usage: 125.00 GB
'exceeded disk limit' is not caused due to reserving too less disk space for BOINC.

There is a server setting coming with each task.
This setting (rsc_disk_bound - for Theory 8000000000 bytes) let abort that task, when the total disk space in the slot (including subdirs) used for that task is exceeded
ID: 50805 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1473
Credit: 9,932,820
RAC: 813
Message 50839 - Posted: 18 Oct 2024, 20:20:23 UTC

This one ended after 35 hours and just started the event processing without a valid result: https://lhcathome.cern.ch/lhcathome/result.php?resultid=415017176

ID: 50839 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Theory Application : Theory Tasks on various hosts failing since last night


©2025 CERN