EXIT_DISK_LIMIT

Author	Message
mmonnin Send message Joined: 22 Mar 17 Posts: 77 Credit: 28,665,535 RAC: 158,620	Message 35102 - Posted: 28 Apr 2018, 2:03:00 UTC https://lhcathome.cern.ch/lhcathome/result.php?resultid=188317371 Getting some of these with the tasks released today. Errors around 20min for everyone. 15 completed (13 of those very short) and 9 errors so a poor ratio and ever worse when considering the time. ID: 35102 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,572,063 RAC: 76,624	Message 35114 - Posted: 29 Apr 2018, 17:00:25 UTC same happened here with 2 tasks that I started this afternoon: https://lhcathome.cern.ch/lhcathome/result.php?resultid=188577009 - task name: w-c4_job.B1topenergy... and https://lhcathome.cern.ch/lhcathome/result.php?resultid=188576839 - task name: w-c6_job.B1topenergy... Both tasks failed after about 51 minutes. What's going on there? ID: 35114 · Reply Quote

Ano Send message Joined: 29 Nov 09 Posts: 42 Credit: 229,229 RAC: 0	Message 35117 - Posted: 30 Apr 2018, 6:18:28 UTC Got it too: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=91831921 https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=91967947 The irony is that I was coming here to check if a task I was running (also a B1topenergy something) got errors from other users, because it was staying a long time on 100% and I was expecting it to fail, but by the time I found that task in my list, it completed properly. So I guess it may not be all tasks of a specific series that error out, and the good news for us users is that everybody error the same way when there's error, so it's not coming from our side. ID: 35117 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2277 Credit: 178,598,724 RAC: 127,230	Message 35118 - Posted: 30 Apr 2018, 8:13:41 UTC There is a old thread with the same problem: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=3944 ID: 35118 · Reply Quote

glennpat Send message Joined: 16 Feb 07 Posts: 4 Credit: 6,937,184 RAC: 0	Message 35143 - Posted: 3 May 2018, 3:57:14 UTC - in response to Message 35118. Has anyone able to stop getting these errors? I read the old thread and I don't have the vm_image.vdi file. I am running Linux. I checked all the slots and didn't see any really large files. I am getting these errors on several computers. If there is some file I need to delete I need to know what it is. ID: 35143 · Reply Quote

Lorenz Millinger Send message Joined: 15 Jul 09 Posts: 3 Credit: 16,534,541 RAC: 0	Message 35144 - Posted: 3 May 2018, 5:01:55 UTC I have 345 errrored WUs! A failure Ratio of 40%!!! does anyone work to prevent bad WUs to be sent? We are wasting valueable computing power for nothing. ID: 35144 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2277 Credit: 178,598,724 RAC: 127,230	Message 35146 - Posted: 3 May 2018, 6:03:40 UTC Therefore we need some Informations from the Sixtrack-Team. ID: 35146 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 878 Credit: 745,608,592 RAC: 315,698	Message 35147 - Posted: 3 May 2018, 6:37:08 UTC The team is working on it as we speak ID: 35147 · Reply Quote

Alessio Mereghetti Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0	Message 35148 - Posted: 3 May 2018, 7:35:34 UTC - in response to Message 35147. Dear all, sorry for the late reply. Apparently the user has requested SixTrack to dump detailed information about particle dynamics, with the consequent increase of disk usage beyond the requirements. We are deleting the WUs - most probably they will come again, this time with more appropriate parameters. Happy crunching to everyone and sorry for the disturbance. Cheers, A. ID: 35148 · Reply Quote

PDW Send message Joined: 7 Aug 14 Posts: 27 Credit: 10,000,924 RAC: 0	Message 40045 - Posted: 29 Sep 2019, 12:47:27 UTC - in response to Message 35148. This is happening again, for example... https://lhcathome.cern.ch/lhcathome/result.php?resultid=246803675 ID: 40045 · Reply Quote

Alessio Mereghetti Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0	Message 40065 - Posted: 2 Oct 2019, 8:31:59 UTC - in response to Message 40045. Thanks, PDW, for spotting this problem again. The failures are due to a (log) file growing beyond the DISK request. The user was not aware that he should have increased the request if he was submitting extremely long jobs (1e7 turns, when we tipically simulate a factor 10 less). On the code side, the next release won't generate this (log) file unless explicitly requested by the user. For the affected tasks, I am looking into the possibility of anyway granting some credit for the CPU time even if the results are not going to be valdated... I apology for the inconvenience, and thanks again for the support! A. ID: 40065 · Reply Quote

Alessio Mereghetti Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0	Message 40086 - Posted: 7 Oct 2019, 16:11:53 UTC - in response to Message 40065. An update on this issue - I managed to grant credit to tasks failing because of the EXIT_DISK_LIMIT_EXCEEDED issue on the specific study due to the specific inconsistent setting value. The credit does not represent the full credit that would be acknowledge if the task was run till the end and validated to avoid cheating - in the end, all the tasks failed before coming to conclusion and there was no way to validate the partial result. Please post here if something odd related to this study happens. Happy crunching, A. ID: 40086 · Reply Quote

Greger Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0	Message 43286 - Posted: 27 Aug 2020, 19:48:24 UTC Last modified: 27 Aug 2020, 20:43:15 UTC This issue have come up again with current batch of sixtrack. Example https://lhcathome.cern.ch/lhcathome/result.php?resultid=281854540 <core_client_version>7.16.6</core_client_version> <![CDATA[ <message> Disk usage limit exceeded</message> <stderr_txt> </stderr_txt> ]]> If i remember it correct Is it these lines in cc_config.xml to allow bigger files? <max_stderr_file_size></max_stderr_file_size> <max_stdout_file_size></max_stdout_file_size> Edit: Added it with high value re-check files and restarted client but they keep error out at same time/size so i am probably wrong. Could someone verify the issue and if occurs to more then my hosts. If this issue remain could we abort these work units? ID: 43286 · Reply Quote

Greger Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0	Message 43287 - Posted: 27 Aug 2020, 21:09:53 UTC Last modified: 27 Aug 2020, 21:10:28 UTC Check slot folder it show rsc bound with 200 MB and folder in total at around 1 hour in and running is at around 190 MB. <rsc_disk_bound>200000000.000000</rsc_disk_bound> I can not see any .xml file for sixtrack so i can not increase that value for new task. few minutes later.... It reached death at 205.2 MB mark. ID: 43287 · Reply Quote

grumpy Send message Joined: 1 Sep 04 Posts: 57 Credit: 2,835,005 RAC: 0	Message 43288 - Posted: 28 Aug 2020, 2:50:47 UTC Too many 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED ID: 43288 · Reply Quote

pvh Send message Joined: 17 Jun 13 Posts: 8 Credit: 6,548,286 RAC: 0	Message 43290 - Posted: 28 Aug 2020, 7:27:03 UTC - in response to Message 43288. I am getting those too. Plenty of room on the disk, so that is not the problem... One example is w-c0.004_0.000_job.B2_2012_rerun_c0.004_0.000.4415__48__s__64.282_59.312__4.1_6.1__6__4.94505_1_sixvf_boinc15045_0 ID: 43290 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,811,788 RAC: 71,241	Message 43291 - Posted: 28 Aug 2020, 7:29:36 UTC This value is set by the project server for each SixTrack task: <rsc_disk_bound>200000000.000000</rsc_disk_bound> The file "singletrackfile.dat" located in the slots directory continuously grows during calculation and at a certain point the total size of the slots directory hits the limit set via <rsc_disk_bound>. It requires investigation from the project team why the file grows to that size. As an option <rsc_disk_bound> has to be set to a higher value at the project server. ID: 43291 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,852,993 RAC: 3,041	Message 43292 - Posted: 28 Aug 2020, 9:13:56 UTC - in response to Message 43291. The file "singletrackfile.dat" located in the slots directory continuously grows during calculation and at a certain point the total size of the slots directory hits the limit set via <rsc_disk_bound>. Question is, whether the task will end successful when singletrackfile.dat may grow further. I'm testing that atm by tenfolding rsc_disk_bound. First task was a shorty https://lhcathome.cern.ch/lhcathome/result.php?resultid=282028703 Second task 1 hour running with a growing singletrackfile.dat (72MB) ID: 43292 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2679 Credit: 286,811,788 RAC: 71,241	Message 43293 - Posted: 28 Aug 2020, 9:49:45 UTC Very strange. Found an example where the same WU reports different peak disk usage values: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=144204024 computer 1: 347.79 MB (failed) computer 2: 4.37 MB (succeeded) computer 3: 171.35 MB (succeeded) ID: 43293 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,852,993 RAC: 3,041	Message 43295 - Posted: 28 Aug 2020, 16:00:36 UTC - in response to Message 43292. Second task 1 hour running with a growing singletrackfile.dat (72MB) Second task finished: Workunit: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=144292270 The wingman has 'only' a peak disk usage of 5.21 MB and mine has a peak disk usage of 469.54 MB (exceeding 200000000 bytes and therefore normally gets the disk limit error) The singletrackfile.dat file grow up to 462,537 kB. It looked like the file fort.6 was used all the time. 33 files in that slot so far. When new files were created (saw 35 files in the slot-dir at the most), it seems singletrackfile.dat was not growing anymore. ID: 43295 · Reply Quote

LHC@home