Message boards : Sixtrack Application : EXIT_DISK_LIMIT_EXCEEDED
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
mmonnin

Send message
Joined: 22 Mar 17
Posts: 44
Credit: 3,801,950
RAC: 0
Message 35102 - Posted: 28 Apr 2018, 2:03:00 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=188317371

Getting some of these with the tasks released today. Errors around 20min for everyone. 15 completed (13 of those very short) and 9 errors so a poor ratio and ever worse when considering the time.
ID: 35102 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1322
Credit: 24,410,245
RAC: 10,334
Message 35114 - Posted: 29 Apr 2018, 17:00:25 UTC

same happened here with 2 tasks that I started this afternoon:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=188577009 - task name: w-c4_job.B1topenergy...
and
https://lhcathome.cern.ch/lhcathome/result.php?resultid=188576839 - task name: w-c6_job.B1topenergy...

Both tasks failed after about 51 minutes. What's going on there?
ID: 35114 · Report as offensive     Reply Quote
Ano

Send message
Joined: 29 Nov 09
Posts: 42
Credit: 229,229
RAC: 0
Message 35117 - Posted: 30 Apr 2018, 6:18:28 UTC

Got it too:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=91831921
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=91967947

The irony is that I was coming here to check if a task I was running (also a B1topenergy something) got errors from other users, because it was staying a long time on 100% and I was expecting it to fail, but by the time I found that task in my list, it completed properly.
So I guess it may not be all tasks of a specific series that error out, and the good news for us users is that everybody error the same way when there's error, so it's not coming from our side.
ID: 35117 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1074
Credit: 36,389,357
RAC: 5,035
Message 35118 - Posted: 30 Apr 2018, 8:13:41 UTC

There is a old thread with the same problem:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=3944
ID: 35118 · Report as offensive     Reply Quote
glennpat

Send message
Joined: 16 Feb 07
Posts: 4
Credit: 6,937,184
RAC: 0
Message 35143 - Posted: 3 May 2018, 3:57:14 UTC - in response to Message 35118.  

Has anyone able to stop getting these errors? I read the old thread and I don't have the vm_image.vdi file. I am running Linux. I checked all the slots and didn't see any really large files. I am getting these errors on several computers. If there is some file I need to delete I need to know what it is.
ID: 35143 · Report as offensive     Reply Quote
Lorenz Millinger

Send message
Joined: 15 Jul 09
Posts: 3
Credit: 9,247,502
RAC: 12
Message 35144 - Posted: 3 May 2018, 5:01:55 UTC

I have 345 errrored WUs! A failure Ratio of 40%!!! does anyone work to prevent bad WUs to be sent? We are wasting valueable computing power for nothing.
ID: 35144 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1074
Credit: 36,389,357
RAC: 5,035
Message 35146 - Posted: 3 May 2018, 6:03:40 UTC

Therefore we need some Informations from the Sixtrack-Team.
ID: 35146 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 617
Credit: 385,766,560
RAC: 130,658
Message 35147 - Posted: 3 May 2018, 6:37:08 UTC

The team is working on it as we speak
ID: 35147 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,557,581
RAC: 2,972
Message 35148 - Posted: 3 May 2018, 7:35:34 UTC - in response to Message 35147.  

Dear all,
sorry for the late reply. Apparently the user has requested SixTrack to dump detailed information about particle dynamics, with the consequent increase of disk usage beyond the requirements.
We are deleting the WUs - most probably they will come again, this time with more appropriate parameters.
Happy crunching to everyone and sorry for the disturbance.
Cheers,
A.
ID: 35148 · Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Aug 14
Posts: 14
Credit: 7,369,198
RAC: 32
Message 40045 - Posted: 29 Sep 2019, 12:47:27 UTC - in response to Message 35148.  

ID: 40045 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,557,581
RAC: 2,972
Message 40065 - Posted: 2 Oct 2019, 8:31:59 UTC - in response to Message 40045.  

Thanks, PDW, for spotting this problem again.

The failures are due to a (log) file growing beyond the DISK request.
The user was not aware that he should have increased the request if he was submitting extremely long jobs (1e7 turns, when we tipically simulate a factor 10 less).

On the code side, the next release won't generate this (log) file unless explicitly requested by the user.
For the affected tasks, I am looking into the possibility of anyway granting some credit for the CPU time even if the results are not going to be valdated...

I apology for the inconvenience, and thanks again for the support!
A.
ID: 40065 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,557,581
RAC: 2,972
Message 40086 - Posted: 7 Oct 2019, 16:11:53 UTC - in response to Message 40065.  

An update on this issue - I managed to grant credit to tasks failing because of the EXIT_DISK_LIMIT_EXCEEDED issue on the specific study due to the specific inconsistent setting value.
The credit does not represent the full credit that would be acknowledge if the task was run till the end and validated to avoid cheating - in the end, all the tasks failed before coming to conclusion and there was no way to validate the partial result.

Please post here if something odd related to this study happens.

Happy crunching,
A.
ID: 40086 · Report as offensive     Reply Quote
Gunde

Send message
Joined: 9 Jan 15
Posts: 141
Credit: 413,172,999
RAC: 320,901
Message 43286 - Posted: 27 Aug 2020, 19:48:24 UTC
Last modified: 27 Aug 2020, 20:43:15 UTC

This issue have come up again with current batch of sixtrack.

Example https://lhcathome.cern.ch/lhcathome/result.php?resultid=281854540

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
Disk usage limit exceeded</message>
<stderr_txt>

</stderr_txt>
]]>


If i remember it correct Is it these lines in cc_config.xml to allow bigger files?

<max_stderr_file_size></max_stderr_file_size>
<max_stdout_file_size></max_stdout_file_size>


Edit: Added it with high value re-check files and restarted client but they keep error out at same time/size so i am probably wrong.

Could someone verify the issue and if occurs to more then my hosts. If this issue remain could we abort these work units?
ID: 43286 · Report as offensive     Reply Quote
Gunde

Send message
Joined: 9 Jan 15
Posts: 141
Credit: 413,172,999
RAC: 320,901
Message 43287 - Posted: 27 Aug 2020, 21:09:53 UTC
Last modified: 27 Aug 2020, 21:10:28 UTC

Check slot folder it show rsc bound with 200 MB and folder in total at around 1 hour in and running is at around 190 MB.

<rsc_disk_bound>200000000.000000</rsc_disk_bound>


I can not see any .xml file for sixtrack so i can not increase that value for new task.

few minutes later....

It reached death at 205.2 MB mark.
ID: 43287 · Report as offensive     Reply Quote
grumpy

Send message
Joined: 1 Sep 04
Posts: 57
Credit: 2,374,353
RAC: 7,018
Message 43288 - Posted: 28 Aug 2020, 2:50:47 UTC

Too many
196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED
ID: 43288 · Report as offensive     Reply Quote
pvh

Send message
Joined: 17 Jun 13
Posts: 8
Credit: 6,548,286
RAC: 0
Message 43290 - Posted: 28 Aug 2020, 7:27:03 UTC - in response to Message 43288.  

I am getting those too. Plenty of room on the disk, so that is not the problem... One example is w-c0.004_0.000_job.B2_2012_rerun_c0.004_0.000.4415__48__s__64.282_59.312__4.1_6.1__6__4.94505_1_sixvf_boinc15045_0
ID: 43290 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1615
Credit: 94,985,709
RAC: 96,731
Message 43291 - Posted: 28 Aug 2020, 7:29:36 UTC

This value is set by the project server for each SixTrack task:
<rsc_disk_bound>200000000.000000</rsc_disk_bound>

The file "singletrackfile.dat" located in the slots directory continuously grows during calculation and at a certain point the total size of the slots directory hits the limit set via <rsc_disk_bound>.

It requires investigation from the project team why the file grows to that size.
As an option <rsc_disk_bound> has to be set to a higher value at the project server.
ID: 43291 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 995
Credit: 6,428,278
RAC: 466
Message 43292 - Posted: 28 Aug 2020, 9:13:56 UTC - in response to Message 43291.  

The file "singletrackfile.dat" located in the slots directory continuously grows during calculation and at a certain point the total size of the slots directory hits the limit set via <rsc_disk_bound>.
Question is, whether the task will end successful when singletrackfile.dat may grow further.
I'm testing that atm by tenfolding rsc_disk_bound.
First task was a shorty https://lhcathome.cern.ch/lhcathome/result.php?resultid=282028703
Second task 1 hour running with a growing singletrackfile.dat (72MB)
ID: 43292 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1615
Credit: 94,985,709
RAC: 96,731
Message 43293 - Posted: 28 Aug 2020, 9:49:45 UTC

Very strange.

Found an example where the same WU reports different peak disk usage values:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=144204024

computer 1: 347.79 MB (failed)
computer 2: 4.37 MB (succeeded)
computer 3: 171.35 MB (succeeded)
ID: 43293 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 995
Credit: 6,428,278
RAC: 466
Message 43295 - Posted: 28 Aug 2020, 16:00:36 UTC - in response to Message 43292.  

Second task 1 hour running with a growing singletrackfile.dat (72MB)
Second task finished: Workunit: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=144292270
The wingman has 'only' a peak disk usage of 5.21 MB and mine has a peak disk usage of 469.54 MB (exceeding 200000000 bytes and therefore normally gets the disk limit error)
The singletrackfile.dat file grow up to 462,537 kB.
It looked like the file fort.6 was used all the time. 33 files in that slot so far. When new files were created (saw 35 files in the slot-dir at the most), it seems singletrackfile.dat was not growing anymore.
ID: 43295 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Sixtrack Application : EXIT_DISK_LIMIT_EXCEEDED


©2021 CERN