Message boards :
Theory Application :
Herwig7 7.2.1 nlo-dipole tasks run very slowly.
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,946,414 RAC: 19,299 |
The slot folder contains a file calles "init_data.xml" in which, among lots of data, the rsc_disk-bound limitation shows up. Has anyone ever tried to increase this value?The answer is: YES. I did it a few minutes ago by replacing the first digit "8" by "12". So the question is: will this have the desired effect? |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,038 |
The answer is NO.The slot folder contains a file calles "init_data.xml" in which, among lots of data, the rsc_disk-bound limitation shows up. Has anyone ever tried to increase this value?The answer is: YES. I did it a few minutes ago by replacing the first digit "8" by "12". So the question is: will this have the desired effect? The init_data.xml is an extraction from client_state.xml and contains data for that specific task. But indeed the rsc_disk_bound is the limit setting for that task and controlled by BOINC and not by the task. So BOINC kills the task when the limit is exceeded like: 09-Oct-2024 00:17:08 [LHC@home] Aborting task Theory_2794-3267759-142_1: exceeded disk limit: 9321.23MB > 7629.39MB The setting may be changed in client_state.xml when the client is not running and must be done for every (new) task. I don't want to encourage everyone to fiddle around with client_state.xml, so no support, but try and error on your own. Two results with excessive use of the slot-folder: https://lhcathome.cern.ch/lhcathome/result.php?resultid=415016497 Peak disk usage 17.27 GB https://lhcathome.cern.ch/lhcathome/result.php?resultid=415016558 Peak disk usage 17.04 GB |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 28,391 |
The disk limit is used as a watchdog to avoid a task can write huge amounts of data to the disk until it is really full. 8 GB usually has plenty of headroom, so there's no need to extend that value. Instead, if there are tasks hitting the limit (without writing snapshots for certain reasons) there's something wrong with the task setup, e.g. lots of errors written to the internal logfiles. At the end it's nothing that should be repaired by the BOINC volunteers. Suggestion: Let the tasks run to get the wrong ones sorted out automatically. Failed tasks are not nice but once they are in the queue nobody will manually remove them. |
Send message Joined: 18 Dec 15 Posts: 1821 Credit: 118,946,414 RAC: 19,299 |
computezrmle wrote:well, to me now the low CPU usage (70%) seems to be a result of too much swapping because of the 630MB default RAM. After I increased this value to 1.536MB (I have plenty of RAM available on some of my hosts) the CPU usage figure shown for Herwig in console_3 is around 98/99%....but those tasks could suffer from |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 28,391 |
After I increased this value to 1.536MB (I have plenty of RAM available on some of my hosts) the CPU usage figure shown for Herwig in console_3 is around 98/99%. +1 |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 374 |
Have one herwig7 7.2.0 nlo 37000 110 Is there a difference with herwig7 7.2.1 File have 2.03 GByte and is NOT growing. |
Send message Joined: 17 Sep 11 Posts: 1 Credit: 2,393,874 RAC: 1,090 |
Every single one of these I have let run have had "Error while computing" finish, so it seems pointless to run any longer unless there is a way to make them work. I have now started aborting them. |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 374 |
Ok, but will control it for the next days. Limit is 10 days ;-) 4 GByte now for this Task. At the End 10 GByte and C R A S H E D. This sort of Tasks needed also been stopped from Cern-IT like SHERPA in the past! |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,038 |
This task https://lhcathome.cern.ch/lhcathome/result.php?resultid=415343514 exceeded the disk limit - [boinc pp z1j 13000 110 - herwig7 7.2.0 nlo 100000 234] LHC@home 30 Oct 07:56:50 Aborting task Theory_2794-3245286-234_1: exceeded disk limit: 17327.41MB > 7629.39MB |
Send message Joined: 24 Oct 04 Posts: 1176 Credit: 54,887,670 RAC: 4,726 |
So far the -dev version 6.01 (vbox64_theory) Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU [boinc pp z1j 13000 35 - herwig7 7.2.1 nlo-pw-dipole 100000 253 has been running Valids in just over 5 days each |
Send message Joined: 24 Oct 04 Posts: 1176 Credit: 54,887,670 RAC: 4,726 |
So far the -dev version 6.01 (vbox64_theory) Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU I knew that would be a jinx if I said they worked because my current batch of 4 of those 3 just crashed after ....... Run time 6 days 6 hours 20 min 57 sec CPU time 6 days 4 hours 52 min 3 sec The 4th one is still in the running log but I will be surprised if that actually is Valid since the previous ones finished in 5 days |
Send message Joined: 13 Jan 24 Posts: 3 Credit: 2,248,813 RAC: 3,027 |
Thanks for the hint about the memory size. I set <memory_size_mb>2048</memory_size_mb> in Theory_2024_04_30_prod.xml and finally a couple of herwig tasks completed successfully after 9+ days. Those tasks used up to about 1.3 GB in the VM, so 1536 MB would have been sufficient. 630 MB is clearly inadequate given the number of tasks that had been running out of time or otherwise getting errors. |
©2024 CERN