Message boards :
Theory Application :
196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED - how come?
Message board moderation
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1827 Credit: 119,535,315 RAC: 42,794 |
This task https://lhcathome.cern.ch/lhcathome/result.php?resultid=208859885 errored out after about 5 hours with 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED. The report shows a Peak disk usage of 18,312.37 MB which is totally unusual; I checked other tasks, they had about 1 GB. But still the BOINC settings should have allowed a even higher disk usage: 90% of 195GB are for use with BOINC. Does anyone have an explanation what happened? |
Send message Joined: 27 Sep 08 Posts: 852 Credit: 694,226,374 RAC: 112,542 |
The project team set the limit wrong when they submitted the WU's, you have to just let them die they will fix themselves over time |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
But still the BOINC settings should have allowed a even higher disk usage: 90% of 195GB are for use with BOINC.There are 2 "disk limits". You are confusing the two. That "90% of 195 GB" is the limit for the total disk usage by all BOINC tasks. This limit is user configurable in BOINC manager. The 196 error refers to the disk limit placed on a single task. This limit is NOT user configurable, it is NOT the disk limit referred to above. This limit is set by the server. |
Send message Joined: 14 Jan 10 Posts: 1427 Credit: 9,492,726 RAC: 820 |
This task My explanation: The working slot (incl. subdirs) of a Theory task may not exceed 8000000000 bytes eqs 7629.39453125 MB. Somehow the size of all files together grew abnormal. The task did 1 job and was not suspended creating snapshots, I suppose Vbox.log or VBoxHardening.log is responsible for the enormous size, maybe loop writing an error situation. |
Send message Joined: 18 Dec 15 Posts: 1827 Credit: 119,535,315 RAC: 42,794 |
This morning, again a task errored out, after 4 1/2 hours, with the exit disk limit failure: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211563104 What's wrong this time? |
Send message Joined: 14 Jan 10 Posts: 1427 Credit: 9,492,726 RAC: 820 |
This morning, again a task errored out, after 4 1/2 hours, with the exit disk limit failure: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211563104 2018-12-08 11:51:03 (1464): Guest Log: [INFO] Job finished in slot1 with 1. 2018-12-08 11:51:08 (1464): Guest Log: [INFO] New Job Starting in slot1 2018-12-08 11:51:08 (1464): Guest Log: [INFO] Condor JobID: 482880.13 in slot1 2018-12-08 11:51:08 (1464): Guest Log: [INFO] Job finished in slot1 with 2. Strange that the Job finished within a second even without issuing a MCPlots ID. Finished with 2 normally means something like 'file not found'. What's about that time in BOINC's event log? |
Send message Joined: 18 Dec 15 Posts: 1827 Credit: 119,535,315 RAC: 42,794 |
What's about that time in BOINC's event log?nothing particular |
Send message Joined: 28 Sep 04 Posts: 735 Credit: 49,844,204 RAC: 35,579 |
I've got one of these errors during last night, here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211545734 It ran 18 hours and 5 minutes and was propably just about to end when the error hit. Peak disk usage was 16,501.43 MB. |
Send message Joined: 18 Dec 15 Posts: 1827 Credit: 119,535,315 RAC: 42,794 |
same problem happened yesterday: https://lhcathome.cern.ch/lhcathome/result.php?resultid=212297479 anyone any idea why so? |
Send message Joined: 18 Dec 15 Posts: 1827 Credit: 119,535,315 RAC: 42,794 |
The next 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED error, after about 7 1/2 hours runtime: Anyone any explanation how come? |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,268,827 RAC: 57,132 |
Most likely misbehaving sherpa jobs that create huge logs until the disk limit is reached. They are hard to debug as the logs are lost as soon as the VM is shut down. See the discussion at LHC-dev: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=438 |
Send message Joined: 14 Jan 10 Posts: 1427 Credit: 9,492,726 RAC: 820 |
LHC@home 22 Dec 12:00:39 UTC Aborting task Theory_1543818_1545461975.206982_0: exceeded disk limit: 9596.10MB > 7629.39MB Task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=212747277 |
Send message Joined: 18 Dec 15 Posts: 1827 Credit: 119,535,315 RAC: 42,794 |
I had two more yesterday: https://lhcathome.cern.ch/lhcathome/result.php?resultid=212811309 https://lhcathome.cern.ch/lhcathome/result.php?resultid=212826747 and one the day before yesterday: https://lhcathome.cern.ch/lhcathome/result.php?resultid=212743711 they failed between 13 and 18 hours processing time, which is a shame :-( |
Send message Joined: 18 Dec 15 Posts: 1827 Credit: 119,535,315 RAC: 42,794 |
Can anyone from the LHC@home people please explain why there are that many tasks lately erroring out with "196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED" latest case from one of my PCs: https://lhcathome.cern.ch/lhcathome/result.php?resultid=212965047 again, 6 1/2 hours of CPU time for nothing :-( |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,988,818 RAC: 7,494 |
Most likely misbehaving sherpa jobs that create huge logs until the disk limit is reached. |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,866,264 RAC: 0 |
Got one here,11hrs+ in, with running.log at 4.4GB In Console window, stuff is whizzing past too fast to read. I'm going to reset the VM so as not to waste any more time on it as it's likely to grow too big and fail but hopefully the identification details captured below will be helpful. From stdout.log 06:31:00 +0000 2018-12-28 [INFO] Condor JobID: 484579.129 in slot1 06:31:06 +0000 2018-12-28 [INFO] MCPlots JobID: 47878534 in slot1 Top line of running.log ===> [runRivet] Fri Dec 28 06:31:01 GMT 2018 [boinc pp jets 8000 180,-,3560 - sherpa 2.2.0 default 100000 8] |
Send message Joined: 18 Dec 15 Posts: 1827 Credit: 119,535,315 RAC: 42,794 |
this specific problem has been occuring in the recent past only, as far as I could see. So something must have been altered with these tasks at LHC@home. |
Send message Joined: 14 Jan 10 Posts: 1427 Credit: 9,492,726 RAC: 820 |
Got another one too. Could not watch the running job, but most probably a sherpa. Stderr output: 2018-12-28 19:02:34 UTC (4528): Guest Log: [INFO] New Job Starting in slot1 BOINC event: 2018-12-29 05:02:23 UTC Aborting task Theory_1102016_1545999587.637140_0: exceeded disk limit: 8970.10MB > 7629.39MB https://lhcathome.cern.ch/lhcathome/result.php?resultid=213019449 |
Send message Joined: 18 Dec 15 Posts: 1827 Credit: 119,535,315 RAC: 42,794 |
Got another one too.really annoying if this happens after so many hours :-( Total waste of CPU time. |
Send message Joined: 15 Jun 08 Posts: 2549 Credit: 255,268,827 RAC: 57,132 |
This is still an issue on the volunteer's side. I wonder if it is under investigation on the project's side. Could anyone from the project team be so kind as to give a short summary? |
©2025 CERN