196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED

Author	Message
Erich56 Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705	Message 37936 - Posted: 5 Feb 2019, 20:37:56 UTC - in response to Message 37925. here too, it happened again, after 9 hours of processing time. This is rather annoying :-( Would be great if someone from the project team could look into it. ID: 37936 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,653,221 RAC: 15,903	Message 37938 - Posted: 5 Feb 2019, 21:07:31 UTC - in response to Message 37936. I have had a few to fail with this error after 18 hours and 5 minutes so in the very end of calculation. Truely annoying. ID: 37938 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705	Message 37950 - Posted: 7 Feb 2019, 8:33:54 UTC I had another one fail yesterday. Why are some of these tasks mis-configured? How can this happen? ID: 37950 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770	Message 37951 - Posted: 7 Feb 2019, 8:44:39 UTC - in response to Message 37950. This tasks have so much output, that they reached the end of the 20GByte defined Disk. Always a special task of SHERPA. ID: 37951 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 37964 - Posted: 9 Feb 2019, 12:05:37 UTC - in response to Message 37950. Maybe if the tasks were not restricted by the disk limit they would eventually run to completion. Maybe they cannot tell in advance whether a given configuration will hit the disk and time limits or run to completion. Maybe the only way to know if a given config will complete is to try it and see. BTW, I am not suggesting the disk and time limits be raised. ID: 37964 · Reply Quote

Ray Murray Volunteer moderator Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,859,285 RAC: 0	Message 38024 - Posted: 17 Feb 2019, 19:44:26 UTC Last modified: 17 Feb 2019, 21:59:07 UTC Caught this one before it got too big and errored out. Log was at 14GB when I reset the VM. 14:18:43 +0100 2019-02-17 [INFO] New Job Starting in slot1 14:18:43 +0100 2019-02-17 [INFO] Condor JobID: 488822.173 in slot1 14:18:48 +0100 2019-02-17 [INFO] MCPlots JobID: 48599734 in slot1 ===> [runRivet] Sun Feb 17 14:18:43 CET 2019 [boinc pp jets 8000 180,-,3560 - sherpa 1.4.0 default 100000 14] [Not long later] Task errored out anyway so I guess the slot didn't get cleaned out when I reset the VM. Need to look closer at that when I catch another one getting too big. ID: 38024 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705	Message 38056 - Posted: 23 Feb 2019, 7:40:36 UTC Last modified: 23 Feb 2019, 7:42:07 UTC the next one, yesterday: https://lhcathome.cern.ch/lhcathome/result.php?resultid=218049634 and another one, this morning: https://lhcathome.cern.ch/lhcathome/result.php?resultid=218009472 why are there still around these misconfigured tasks ? What a waste :-( ID: 38056 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,469,691 RAC: 131,958	Message 38078 - Posted: 28 Feb 2019, 9:48:13 UTC Got one just before it crashed. Definitely a Sherpa that wrote a huge log of several GB. Logfile snippet: ===> [runRivet] Thu Feb 28 00:21:00 CET 2019 [boinc pp jets 8000 180,-,3560 - sherpa 2.2.4 default 10000 20] . . . ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.94621e+07 Channel_Elements::GenerateYUniform(1.08733,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,-0.0343955,}): Y out of bounds ! ymin, ymax vs. y : 0.0418604 -0.0418604 vs. 0.0111371 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.95888e+07 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.58349e+07 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 7.01566e+07 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.50979e+07 Channel_Elements::GenerateYCentral(1.03108,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,nan,}): Y out of bounds ! ymin, ymax vs. y : 0.0153036 -0.0153036 vs. -0.0102595 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.59892e+07 Channel_Elements::GenerateYCentral(1.01637,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,-0.0102595,}): Y out of bounds ! ymin, ymax vs. y : 0.00812073 -0.00812073 vs. 0.00771583 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.50479e+07 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.4e+07 ISR_Handler::MakeISR(..): s' out of bounds. s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 7.03278e+07 Channel_Elements::GenerateYUniform(1.11375,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,nan,}): Y out of bounds ! ymin, ymax vs. y : 0.0538648 -0.0538648 vs. 0.048588 <followed by thousands of lines with similar content> ID: 38078 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,494,952 RAC: 2,243	Message 38081 - Posted: 28 Feb 2019, 12:38:03 UTC - in response to Message 38078. Got one just before it crashed. Definitely a Sherpa that wrote a huge log of several GB. Logfile snippet: ===> [runRivet] Thu Feb 28 00:21:00 CET 2019 [boinc pp jets 8000 180,-,3560 - sherpa 2.2.4 default 10000 20] . . . Maybe CERN should use SHERPA-2.2.5 (only) It contains bugfixes for all known bugs of SHERPA-2.2.4. ID: 38081 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705	Message 38088 - Posted: 3 Mar 2019, 6:39:49 UTC the next one that failed, after almost 18 hours crunching time: https://lhcathome.cern.ch/lhcathome/result.php?resultid=218485255 It's very annoying that such faulty tasks are still around. This error has been know for such long time now, so why was it not eliminated? ID: 38088 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 38089 - Posted: 3 Mar 2019, 15:21:00 UTC - in response to Message 38088. It's very annoying that such faulty tasks are still around. This error has been know for such long time now, so why was it not eliminated? https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4028&postid=35142#35142 In a nutshell, reproducability. ID: 38089 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705	Message 38090 - Posted: 3 Mar 2019, 17:48:55 UTC - in response to Message 38089. It's very annoying that such faulty tasks are still around. This error has been know for such long time now, so why was it not eliminated? https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4028&postid=35142#35142 In a nutshell, reproducability. I guess what is explained in the posting cited by you: ... At least a partial consolation is that, as far as I know, you are at least still getting credits for them, even though I totally understand that it is frustrating that your CPU is basically idling during those jobs and not contributing to science. ... Peter deals with somehting else. Because in the cases I am complaining about, there is ZERO credit :-( ID: 38090 · Reply Quote

bronco Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0	Message 38094 - Posted: 3 Mar 2019, 22:27:18 UTC - in response to Message 38090. It's very annoying that such faulty tasks are still around. This error has been know for such long time now, so why was it not eliminated? https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4028&postid=35142#35142 In a nutshell, reproducability. I guess what is explained in the posting cited by you: ... At least a partial consolation is that, as far as I know, you are at least still getting credits for them, even though I totally understand that it is frustrating that your CPU is basically idling during those jobs and not contributing to science. ... Peter deals with somehting else. Because in the cases I am complaining about, there is ZERO credit :-( I'll wager reproducability is the reason they tolerate looping sherpas as well as the reason they tolerate 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED. ID: 38094 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,653,221 RAC: 15,903	Message 38234 - Posted: 13 Mar 2019, 7:58:47 UTC Once again one of these popped up: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219028961 ID: 38234 · Reply Quote

Guiri-One[Andalucia] Send message Joined: 1 Feb 06 Posts: 66 Credit: 9,723 RAC: 0	Message 38235 - Posted: 13 Mar 2019, 8:46:28 UTC - in response to Message 38234. Because of all these, last weekend was last time I tried (with no succes) to run any VM LHC task. I will stick to sixtrack, when available. Such a pity. ID: 38235 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705	Message 38258 - Posted: 17 Mar 2019, 13:57:20 UTC - in response to Message 38235. again a task failed after 9 1/2 hours: https://lhcathome.cern.ch/lhcathome/result.php?resultid=219309143 much annoying :-((( ID: 38258 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705	Message 38480 - Posted: 28 Mar 2019, 20:52:49 UTC by now I am really pissed off by this "disk_limit_exceeded" failure. My CPU was working for more than 15 hours, and then this :-((( https://lhcathome.cern.ch/lhcathome/result.php?resultid=220073182 why does this problem still exist? Can someone there get it rectified finally? ID: 38480 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705	Message 38506 - Posted: 1 Apr 2019, 3:20:39 UTC the next one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=220284354 after 13:30 hrs CPU time :-((( Can someone finally stop this nonsense, please !!! ID: 38506 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 374 Credit: 238,712 RAC: 0	Message 38507 - Posted: 1 Apr 2019, 12:32:44 UTC - in response to Message 38506. Looking at the result it says that 18GB was used. This is quite high. The task seemed to finish correctly. My guess is the BOINC client is doing some operations after the task has ended and a threshold was reached. You may wish to check the status of your local slot/project directories. ID: 38507 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,877,801 RAC: 121,705	Message 38508 - Posted: 1 Apr 2019, 18:26:29 UTC - in response to Message 38507. I now have looked up some other tasks that failed, and all showed a peak disk usage value of about 18.300 MB. Whereas the non-failing tasks used about 1.000 MB (with 200 MB up or down). Also, as suggested, I checked the local directories, but could not spot any space limitations there. The disk size is 190GB, out of which about 160GB are free. The settings in BOINC are to allow usage up to 90% of the disk. So, unless I am missing something, there does not seem to be any threshold reached within my settings, even with a task that uses 18GB. However, what I am wondering is why it's exactly the failing tasks that use that much more disk space (i.e. 18 x more) in contrast to the others, non-failing ones, which use about 1 GB. Hence, I still think that there may be something wrong with these failing tasks. ID: 38508 · Reply Quote

LHC@home