Thread '196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED

Author	Message
Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37403 - Posted: 22 Nov 2018, 17:55:18 UTC This task https://lhcathome.cern.ch/lhcathome/result.php?resultid=208859885 errored out after about 5 hours with 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED. The report shows a Peak disk usage of 18,312.37 MB which is totally unusual; I checked other tasks, they had about 1 GB. But still the BOINC settings should have allowed a even higher disk usage: 90% of 195GB are for use with BOINC. Does anyone have an explanation what happened? ID: 37403 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 946 Credit: 784,373,554 RAC: 158,973	Message 37404 - Posted: 22 Nov 2018, 18:23:21 UTC - in response to Message 37403. The project team set the limit wrong when they submitted the WU's, you have to just let them die they will fix themselves over time ID: 37404 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 37405 - Posted: 22 Nov 2018, 18:35:35 UTC - in response to Message 37403. But still the BOINC settings should have allowed a even higher disk usage: 90% of 195GB are for use with BOINC. There are 2 "disk limits". You are confusing the two. That "90% of 195 GB" is the limit for the total disk usage by all BOINC tasks. This limit is user configurable in BOINC manager. The 196 error refers to the disk limit placed on a single task. This limit is NOT user configurable, it is NOT the disk limit referred to above. This limit is set by the server. ID: 37405 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1556 Credit: 10,101,515 RAC: 1,464	Message 37410 - Posted: 23 Nov 2018, 8:42:05 UTC - in response to Message 37403. This task https://lhcathome.cern.ch/lhcathome/result.php?resultid=208859885 errored out after about 5 hours with 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED. The report shows a Peak disk usage of 18,312.37 MB which is totally unusual; I checked other tasks, they had about 1 GB. But still the BOINC settings should have allowed a even higher disk usage: 90% of 195GB are for use with BOINC. Does anyone have an explanation what happened? My explanation: The working slot (incl. subdirs) of a Theory task may not exceed 8000000000 bytes eqs 7629.39453125 MB. Somehow the size of all files together grew abnormal. The task did 1 job and was not suspended creating snapshots, I suppose Vbox.log or VBoxHardening.log is responsible for the enormous size, maybe loop writing an error situation. ID: 37410 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37566 - Posted: 8 Dec 2018, 12:18:43 UTC This morning, again a task errored out, after 4 1/2 hours, with the exit disk limit failure: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211563104 What's wrong this time? ID: 37566 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1556 Credit: 10,101,515 RAC: 1,464	Message 37567 - Posted: 8 Dec 2018, 13:08:00 UTC - in response to Message 37566. This morning, again a task errored out, after 4 1/2 hours, with the exit disk limit failure: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211563104 What's wrong this time? 2018-12-08 11:51:03 (1464): Guest Log: [INFO] Job finished in slot1 with 1. 2018-12-08 11:51:08 (1464): Guest Log: [INFO] New Job Starting in slot1 2018-12-08 11:51:08 (1464): Guest Log: [INFO] Condor JobID: 482880.13 in slot1 2018-12-08 11:51:08 (1464): Guest Log: [INFO] Job finished in slot1 with 2. Strange that the Job finished within a second even without issuing a MCPlots ID. Finished with 2 normally means something like 'file not found'. What's about that time in BOINC's event log? ID: 37567 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37568 - Posted: 8 Dec 2018, 13:45:27 UTC - in response to Message 37567. What's about that time in BOINC's event log? nothing particular ID: 37568 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 806 Credit: 66,047,456 RAC: 27,780	Message 37576 - Posted: 9 Dec 2018, 11:55:29 UTC I've got one of these errors during last night, here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211545734 It ran 18 hours and 5 minutes and was propably just about to end when the error hit. Peak disk usage was 16,501.43 MB. ID: 37576 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37616 - Posted: 16 Dec 2018, 8:55:19 UTC same problem happened yesterday: https://lhcathome.cern.ch/lhcathome/result.php?resultid=212297479 anyone any idea why so? ID: 37616 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37673 - Posted: 21 Dec 2018, 13:34:46 UTC - in response to Message 37616. The next 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED error, after about 7 1/2 hours runtime: Anyone any explanation how come? ID: 37673 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2755 Credit: 304,271,457 RAC: 116,232	Message 37674 - Posted: 21 Dec 2018, 13:46:25 UTC - in response to Message 37673. Most likely misbehaving sherpa jobs that create huge logs until the disk limit is reached. They are hard to debug as the logs are lost as soon as the VM is shut down. See the discussion at LHC-dev: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=438 ID: 37674 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1556 Credit: 10,101,515 RAC: 1,464	Message 37680 - Posted: 22 Dec 2018, 12:20:09 UTC LHC@home 22 Dec 12:00:39 UTC Aborting task Theory_1543818_1545461975.206982_0: exceeded disk limit: 9596.10MB > 7629.39MB Task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=212747277 ID: 37680 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37693 - Posted: 25 Dec 2018, 10:54:20 UTC I had two more yesterday: https://lhcathome.cern.ch/lhcathome/result.php?resultid=212811309 https://lhcathome.cern.ch/lhcathome/result.php?resultid=212826747 and one the day before yesterday: https://lhcathome.cern.ch/lhcathome/result.php?resultid=212743711 they failed between 13 and 18 hours processing time, which is a shame :-( ID: 37693 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37696 - Posted: 28 Dec 2018, 6:16:17 UTC Can anyone from the LHC@home people please explain why there are that many tasks lately erroring out with "196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED" latest case from one of my PCs: https://lhcathome.cern.ch/lhcathome/result.php?resultid=212965047 again, 6 1/2 hours of CPU time for nothing :-( ID: 37696 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 17,509	Message 37697 - Posted: 28 Dec 2018, 7:09:29 UTC - in response to Message 37674. Most likely misbehaving sherpa jobs that create huge logs until the disk limit is reached. They are hard to debug as the logs are lost as soon as the VM is shut down. See the discussion at LHC-dev: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=438 ID: 37697 · Reply Quote

Ray Murray Volunteer moderator Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,888,115 RAC: 0	Message 37698 - Posted: 28 Dec 2018, 17:56:40 UTC Got one here,11hrs+ in, with running.log at 4.4GB In Console window, stuff is whizzing past too fast to read. I'm going to reset the VM so as not to waste any more time on it as it's likely to grow too big and fail but hopefully the identification details captured below will be helpful. From stdout.log 06:31:00 +0000 2018-12-28 [INFO] Condor JobID: 484579.129 in slot1 06:31:06 +0000 2018-12-28 [INFO] MCPlots JobID: 47878534 in slot1 Top line of running.log ===> [runRivet] Fri Dec 28 06:31:01 GMT 2018 [boinc pp jets 8000 180,-,3560 - sherpa 2.2.0 default 100000 8] ID: 37698 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37699 - Posted: 28 Dec 2018, 20:16:56 UTC this specific problem has been occuring in the recent past only, as far as I could see. So something must have been altered with these tasks at LHC@home. ID: 37699 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1556 Credit: 10,101,515 RAC: 1,464	Message 37701 - Posted: 29 Dec 2018, 7:42:07 UTC - in response to Message 37698. Last modified: 29 Dec 2018, 7:46:57 UTC Got another one too. Could not watch the running job, but most probably a sherpa. Stderr output: 2018-12-28 19:02:34 UTC (4528): Guest Log: [INFO] New Job Starting in slot1 BOINC event: 2018-12-29 05:02:23 UTC Aborting task Theory_1102016_1545999587.637140_0: exceeded disk limit: 8970.10MB > 7629.39MB https://lhcathome.cern.ch/lhcathome/result.php?resultid=213019449 ID: 37701 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37706 - Posted: 29 Dec 2018, 9:44:13 UTC - in response to Message 37701. Last modified: 29 Dec 2018, 9:44:25 UTC Got another one too. https://lhcathome.cern.ch/lhcathome/result.php?resultid=213019449 really annoying if this happens after so many hours :-( Total waste of CPU time. ID: 37706 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2755 Credit: 304,271,457 RAC: 116,232	Message 37925 - Posted: 5 Feb 2019, 5:53:48 UTC This is still an issue on the volunteer's side. I wonder if it is under investigation on the project's side. Could anyone from the project team be so kind as to give a short summary? ID: 37925 · Reply Quote