Message boards :
Theory Application :
196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED - how come?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705 |
here too, it happened again, after 9 hours of processing time. This is rather annoying :-( Would be great if someone from the project team could look into it. |
Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,653,221 RAC: 15,903 |
I have had a few to fail with this error after 18 hours and 5 minutes so in the very end of calculation. Truely annoying. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705 |
I had another one fail yesterday. Why are some of these tasks mis-configured? How can this happen? |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770 |
This tasks have so much output, that they reached the end of the 20GByte defined Disk. Always a special task of SHERPA. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Maybe if the tasks were not restricted by the disk limit they would eventually run to completion. Maybe they cannot tell in advance whether a given configuration will hit the disk and time limits or run to completion. Maybe the only way to know if a given config will complete is to try it and see. BTW, I am not suggesting the disk and time limits be raised. |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,859,285 RAC: 0 |
Caught this one before it got too big and errored out. Log was at 14GB when I reset the VM. 14:18:43 +0100 2019-02-17 [INFO] New Job Starting in slot1 14:18:43 +0100 2019-02-17 [INFO] Condor JobID: 488822.173 in slot1 14:18:48 +0100 2019-02-17 [INFO] MCPlots JobID: 48599734 in slot1 ===> [runRivet] Sun Feb 17 14:18:43 CET 2019 [boinc pp jets 8000 180,-,3560 - sherpa 1.4.0 default 100000 14] [Not long later] Task errored out anyway so I guess the slot didn't get cleaned out when I reset the VM. Need to look closer at that when I catch another one getting too big. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705 |
the next one, yesterday: https://lhcathome.cern.ch/lhcathome/result.php?resultid=218049634 and another one, this morning: https://lhcathome.cern.ch/lhcathome/result.php?resultid=218009472 why are there still around these misconfigured tasks ? What a waste :-( |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,469,691 RAC: 131,958 |
Got one just before it crashed. Definitely a Sherpa that wrote a huge log of several GB. Logfile snippet: ===> [runRivet] Thu Feb 28 00:21:00 CET 2019 [boinc pp jets 8000 180,-,3560 - sherpa 2.2.4 default 10000 20] . . . ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.94621e+07 Channel_Elements::GenerateYUniform(1.08733,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,-0.0343955,}): Y out of bounds ! ymin, ymax vs. y : 0.0418604 -0.0418604 vs. 0.0111371 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.95888e+07 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.58349e+07 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 7.01566e+07 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.50979e+07 Channel_Elements::GenerateYCentral(1.03108,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,nan,}): Y out of bounds ! ymin, ymax vs. y : 0.0153036 -0.0153036 vs. -0.0102595 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.59892e+07 Channel_Elements::GenerateYCentral(1.01637,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,-0.0102595,}): Y out of bounds ! ymin, ymax vs. y : 0.00812073 -0.00812073 vs. 0.00771583 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.50479e+07 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.4e+07 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 7.03278e+07 Channel_Elements::GenerateYUniform(1.11375,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,nan,}): Y out of bounds ! ymin, ymax vs. y : 0.0538648 -0.0538648 vs. 0.048588 <followed by thousands of lines with similar content> |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,494,952 RAC: 2,243 |
Got one just before it crashed. Maybe CERN should use SHERPA-2.2.5 (only) It contains bugfixes for all known bugs of SHERPA-2.2.4. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705 |
the next one that failed, after almost 18 hours crunching time: https://lhcathome.cern.ch/lhcathome/result.php?resultid=218485255 It's very annoying that such faulty tasks are still around. This error has been know for such long time now, so why was it not eliminated? |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
It's very annoying that such faulty tasks are still around. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4028&postid=35142#35142 In a nutshell, reproducability. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705 |
It's very annoying that such faulty tasks are still around. I guess what is explained in the posting cited by you: ... At least a partial consolation is that, as far as I know, you are at least still getting credits for them, even though I totally understand that it is frustrating that your CPU is basically idling during those jobs and not contributing to science. ...deals with somehting else. Because in the cases I am complaining about, there is ZERO credit :-( |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
It's very annoying that such faulty tasks are still around. I'll wager reproducability is the reason they tolerate looping sherpas as well as the reason they tolerate 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED. |
Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,653,221 RAC: 15,903 |
Once again one of these popped up: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219028961 |
Send message Joined: 1 Feb 06 Posts: 66 Credit: 9,723 RAC: 0 |
Because of all these, last weekend was last time I tried (with no succes) to run any VM LHC task. I will stick to sixtrack, when available. Such a pity. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705 |
again a task failed after 9 1/2 hours: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219309143 much annoying :-((( |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705 |
by now I am really pissed off by this "disk_limit_exceeded" failure. My CPU was working for more than 15 hours, and then this :-((( https://lhcathome.cern.ch/lhcathome/result.php?resultid=220073182 why does this problem still exist? Can someone there get it rectified finally? |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705 |
the next one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=220284354 after 13:30 hrs CPU time :-((( Can someone finally stop this nonsense, please !!! |
Send message Joined: 20 Jun 14 Posts: 374 Credit: 238,712 RAC: 0 |
Looking at the result it says that 18GB was used. This is quite high. The task seemed to finish correctly. My guess is the BOINC client is doing some operations after the task has ended and a threshold was reached. You may wish to check the status of your local slot/project directories. |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705 |
I now have looked up some other tasks that failed, and all showed a peak disk usage value of about 18.300 MB. Whereas the non-failing tasks used about 1.000 MB (with 200 MB up or down). Also, as suggested, I checked the local directories, but could not spot any space limitations there. The disk size is 190GB, out of which about 160GB are free. The settings in BOINC are to allow usage up to 90% of the disk. So, unless I am missing something, there does not seem to be any threshold reached within my settings, even with a task that uses 18GB. However, what I am wondering is why it's exactly the failing tasks that use that much more disk space (i.e. 18 x more) in contrast to the others, non-failing ones, which use about 1 GB. Hence, I still think that there may be something wrong with these failing tasks. |
©2024 CERN