Message boards :
Sixtrack Application :
EXIT_DISK_LIMIT_EXCEEDED
Message board moderation
Author | Message |
---|---|
Send message Joined: 22 Mar 17 Posts: 66 Credit: 20,640,485 RAC: 291,497 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=188317371 Getting some of these with the tasks released today. Errors around 20min for everyone. 15 completed (13 of those very short) and 9 errors so a poor ratio and ever worse when considering the time. |
Send message Joined: 18 Dec 15 Posts: 1838 Credit: 121,642,514 RAC: 88,409 |
same happened here with 2 tasks that I started this afternoon: https://lhcathome.cern.ch/lhcathome/result.php?resultid=188577009 - task name: w-c4_job.B1topenergy... and https://lhcathome.cern.ch/lhcathome/result.php?resultid=188576839 - task name: w-c6_job.B1topenergy... Both tasks failed after about 51 minutes. What's going on there? |
Send message Joined: 29 Nov 09 Posts: 42 Credit: 229,229 RAC: 0 |
Got it too: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=91831921 https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=91967947 The irony is that I was coming here to check if a task I was running (also a B1topenergy something) got errors from other users, because it was staying a long time on 100% and I was expecting it to fail, but by the time I found that task in my list, it completed properly. So I guess it may not be all tasks of a specific series that error out, and the good news for us users is that everybody error the same way when there's error, so it's not coming from our side. |
Send message Joined: 2 May 07 Posts: 2255 Credit: 174,204,943 RAC: 8,340 |
There is a old thread with the same problem: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=3944 |
Send message Joined: 16 Feb 07 Posts: 4 Credit: 6,937,184 RAC: 0 |
Has anyone able to stop getting these errors? I read the old thread and I don't have the vm_image.vdi file. I am running Linux. I checked all the slots and didn't see any really large files. I am getting these errors on several computers. If there is some file I need to delete I need to know what it is. |
Send message Joined: 15 Jul 09 Posts: 3 Credit: 16,534,541 RAC: 0 |
I have 345 errrored WUs! A failure Ratio of 40%!!! does anyone work to prevent bad WUs to be sent? We are wasting valueable computing power for nothing. |
Send message Joined: 2 May 07 Posts: 2255 Credit: 174,204,943 RAC: 8,340 |
Therefore we need some Informations from the Sixtrack-Team. |
Send message Joined: 27 Sep 08 Posts: 854 Credit: 697,836,655 RAC: 139,067 |
The team is working on it as we speak |
Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0 |
Dear all, sorry for the late reply. Apparently the user has requested SixTrack to dump detailed information about particle dynamics, with the consequent increase of disk usage beyond the requirements. We are deleting the WUs - most probably they will come again, this time with more appropriate parameters. Happy crunching to everyone and sorry for the disturbance. Cheers, A. |
Send message Joined: 7 Aug 14 Posts: 27 Credit: 10,000,233 RAC: 5 |
This is happening again, for example... https://lhcathome.cern.ch/lhcathome/result.php?resultid=246803675 |
Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0 |
Thanks, PDW, for spotting this problem again. The failures are due to a (log) file growing beyond the DISK request. The user was not aware that he should have increased the request if he was submitting extremely long jobs (1e7 turns, when we tipically simulate a factor 10 less). On the code side, the next release won't generate this (log) file unless explicitly requested by the user. For the affected tasks, I am looking into the possibility of anyway granting some credit for the CPU time even if the results are not going to be valdated... I apology for the inconvenience, and thanks again for the support! A. |
Send message Joined: 29 Feb 16 Posts: 157 Credit: 2,659,975 RAC: 0 |
An update on this issue - I managed to grant credit to tasks failing because of the EXIT_DISK_LIMIT_EXCEEDED issue on the specific study due to the specific inconsistent setting value. The credit does not represent the full credit that would be acknowledge if the task was run till the end and validated to avoid cheating - in the end, all the tasks failed before coming to conclusion and there was no way to validate the partial result. Please post here if something odd related to this study happens. Happy crunching, A. |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 |
This issue have come up again with current batch of sixtrack. Example https://lhcathome.cern.ch/lhcathome/result.php?resultid=281854540 <core_client_version>7.16.6</core_client_version> If i remember it correct Is it these lines in cc_config.xml to allow bigger files? <max_stderr_file_size></max_stderr_file_size> <max_stdout_file_size></max_stdout_file_size> Edit: Added it with high value re-check files and restarted client but they keep error out at same time/size so i am probably wrong. Could someone verify the issue and if occurs to more then my hosts. If this issue remain could we abort these work units? |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 |
Check slot folder it show rsc bound with 200 MB and folder in total at around 1 hour in and running is at around 190 MB. <rsc_disk_bound>200000000.000000</rsc_disk_bound> I can not see any .xml file for sixtrack so i can not increase that value for new task. few minutes later.... It reached death at 205.2 MB mark. |
Send message Joined: 1 Sep 04 Posts: 57 Credit: 2,835,005 RAC: 0 |
Too many 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED |
Send message Joined: 17 Jun 13 Posts: 8 Credit: 6,548,286 RAC: 0 |
I am getting those too. Plenty of room on the disk, so that is not the problem... One example is w-c0.004_0.000_job.B2_2012_rerun_c0.004_0.000.4415__48__s__64.282_59.312__4.1_6.1__6__4.94505_1_sixvf_boinc15045_0 |
Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,169,256 RAC: 118,985 |
This value is set by the project server for each SixTrack task: <rsc_disk_bound>200000000.000000</rsc_disk_bound> The file "singletrackfile.dat" located in the slots directory continuously grows during calculation and at a certain point the total size of the slots directory hits the limit set via <rsc_disk_bound>. It requires investigation from the project team why the file grows to that size. As an option <rsc_disk_bound> has to be set to a higher value at the project server. |
Send message Joined: 14 Jan 10 Posts: 1437 Credit: 9,614,158 RAC: 2,399 |
The file "singletrackfile.dat" located in the slots directory continuously grows during calculation and at a certain point the total size of the slots directory hits the limit set via <rsc_disk_bound>.Question is, whether the task will end successful when singletrackfile.dat may grow further. I'm testing that atm by tenfolding rsc_disk_bound. First task was a shorty https://lhcathome.cern.ch/lhcathome/result.php?resultid=282028703 Second task 1 hour running with a growing singletrackfile.dat (72MB) |
Send message Joined: 15 Jun 08 Posts: 2567 Credit: 258,169,256 RAC: 118,985 |
Very strange. Found an example where the same WU reports different peak disk usage values: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=144204024 computer 1: 347.79 MB (failed) computer 2: 4.37 MB (succeeded) computer 3: 171.35 MB (succeeded) |
Send message Joined: 14 Jan 10 Posts: 1437 Credit: 9,614,158 RAC: 2,399 |
Second task 1 hour running with a growing singletrackfile.dat (72MB)Second task finished: Workunit: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=144292270 The wingman has 'only' a peak disk usage of 5.21 MB and mine has a peak disk usage of 469.54 MB (exceeding 200000000 bytes and therefore normally gets the disk limit error) The singletrackfile.dat file grow up to 462,537 kB. It looked like the file fort.6 was used all the time. 33 files in that slot so far. When new files were created (saw 35 files in the slot-dir at the most), it seems singletrackfile.dat was not growing anymore. |
©2025 CERN