Message boards : Theory Application : 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED - how come?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,877,801
RAC: 121,705
Message 37936 - Posted: 5 Feb 2019, 20:37:56 UTC - in response to Message 37925.  

here too, it happened again, after 9 hours of processing time. This is rather annoying :-(

Would be great if someone from the project team could look into it.
ID: 37936 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,653,221
RAC: 15,903
Message 37938 - Posted: 5 Feb 2019, 21:07:31 UTC - in response to Message 37936.  

I have had a few to fail with this error after 18 hours and 5 minutes so in the very end of calculation. Truely annoying.
ID: 37938 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,877,801
RAC: 121,705
Message 37950 - Posted: 7 Feb 2019, 8:33:54 UTC

I had another one fail yesterday.

Why are some of these tasks mis-configured? How can this happen?
ID: 37950 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2101
Credit: 159,817,517
RAC: 132,770
Message 37951 - Posted: 7 Feb 2019, 8:44:39 UTC - in response to Message 37950.  

This tasks have so much output, that they reached the end of the 20GByte defined Disk.
Always a special task of SHERPA.
ID: 37951 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37964 - Posted: 9 Feb 2019, 12:05:37 UTC - in response to Message 37950.  

Maybe if the tasks were not restricted by the disk limit they would eventually run to completion. Maybe they cannot tell in advance whether a given configuration will hit the disk and time limits or run to completion. Maybe the only way to know if a given config will complete is to try it and see.

BTW, I am not suggesting the disk and time limits be raised.
ID: 37964 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 0
Message 38024 - Posted: 17 Feb 2019, 19:44:26 UTC
Last modified: 17 Feb 2019, 21:59:07 UTC

Caught this one before it got too big and errored out. Log was at 14GB when I reset the VM.

14:18:43 +0100 2019-02-17 [INFO] New Job Starting in slot1
14:18:43 +0100 2019-02-17 [INFO] Condor JobID: 488822.173 in slot1
14:18:48 +0100 2019-02-17 [INFO] MCPlots JobID: 48599734 in slot1

===> [runRivet] Sun Feb 17 14:18:43 CET 2019 [boinc pp jets 8000 180,-,3560 - sherpa 1.4.0 default 100000 14]

[Not long later]
Task errored out anyway so I guess the slot didn't get cleaned out when I reset the VM. Need to look closer at that when I catch another one getting too big.
ID: 38024 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,877,801
RAC: 121,705
Message 38056 - Posted: 23 Feb 2019, 7:40:36 UTC
Last modified: 23 Feb 2019, 7:42:07 UTC

the next one, yesterday:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=218049634

and another one, this morning:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=218009472

why are there still around these misconfigured tasks ? What a waste :-(
ID: 38056 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2413
Credit: 226,469,691
RAC: 131,958
Message 38078 - Posted: 28 Feb 2019, 9:48:13 UTC

Got one just before it crashed.
Definitely a Sherpa that wrote a huge log of several GB.

Logfile snippet:
===> [runRivet] Thu Feb 28 00:21:00 CET 2019 [boinc pp jets 8000 180,-,3560 - sherpa 2.2.4 default 10000 20]
.
.
.
ISR_Handler::MakeISR(..): s' out of bounds.
  s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.94621e+07
Channel_Elements::GenerateYUniform(1.08733,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,-0.0343955,}):  Y out of bounds ! 
   ymin, ymax vs. y : 0.0418604 -0.0418604 vs. 0.0111371
ISR_Handler::MakeISR(..): s' out of bounds.
  s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.95888e+07
ISR_Handler::MakeISR(..): s' out of bounds.
  s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.58349e+07
ISR_Handler::MakeISR(..): s' out of bounds.
  s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 7.01566e+07
ISR_Handler::MakeISR(..): s' out of bounds.
  s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.50979e+07
Channel_Elements::GenerateYCentral(1.03108,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,nan,}):  Y out of bounds ! 
   ymin, ymax vs. y : 0.0153036 -0.0153036 vs. -0.0102595
ISR_Handler::MakeISR(..): s' out of bounds.
  s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.59892e+07
Channel_Elements::GenerateYCentral(1.01637,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,-0.0102595,}):  Y out of bounds ! 
   ymin, ymax vs. y : 0.00812073 -0.00812073 vs. 0.00771583
ISR_Handler::MakeISR(..): s' out of bounds.
  s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.50479e+07
ISR_Handler::MakeISR(..): s' out of bounds.
  s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 6.4e+07
ISR_Handler::MakeISR(..): s' out of bounds.
  s'_{min}, s'_{max 1,2} vs. s': 7.60416e+07, 6.4e+07, 6.4e+07 vs. 7.03278e+07
Channel_Elements::GenerateYUniform(1.11375,{-8.98847e+307,0,-8.98847e+307,0,0,},{-10,10,nan,}):  Y out of bounds ! 
   ymin, ymax vs. y : 0.0538648 -0.0538648 vs. 0.048588

<followed by thousands of lines with similar content>
ID: 38078 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1280
Credit: 8,494,952
RAC: 2,243
Message 38081 - Posted: 28 Feb 2019, 12:38:03 UTC - in response to Message 38078.  

Got one just before it crashed.
Definitely a Sherpa that wrote a huge log of several GB.

Logfile snippet:
===> [runRivet] Thu Feb 28 00:21:00 CET 2019 [boinc pp jets 8000 180,-,3560 - sherpa 2.2.4 default 10000 20]
.
.
.

Maybe CERN should use SHERPA-2.2.5 (only) It contains bugfixes for all known bugs of SHERPA-2.2.4.
ID: 38081 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,877,801
RAC: 121,705
Message 38088 - Posted: 3 Mar 2019, 6:39:49 UTC

the next one that failed, after almost 18 hours crunching time:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=218485255

It's very annoying that such faulty tasks are still around.
This error has been know for such long time now, so why was it not eliminated?
ID: 38088 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 38089 - Posted: 3 Mar 2019, 15:21:00 UTC - in response to Message 38088.  

It's very annoying that such faulty tasks are still around.
This error has been know for such long time now, so why was it not eliminated?


https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4028&postid=35142#35142
In a nutshell, reproducability.
ID: 38089 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,877,801
RAC: 121,705
Message 38090 - Posted: 3 Mar 2019, 17:48:55 UTC - in response to Message 38089.  

It's very annoying that such faulty tasks are still around.
This error has been know for such long time now, so why was it not eliminated?

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4028&postid=35142#35142
In a nutshell, reproducability.


I guess what is explained in the posting cited by you:
... At least a partial consolation is that, as far as I know, you are at least still getting credits for them, even though I totally understand that it is frustrating that your CPU is basically idling during those jobs and not contributing to science. ...
Peter
deals with somehting else.
Because in the cases I am complaining about, there is ZERO credit :-(
ID: 38090 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 38094 - Posted: 3 Mar 2019, 22:27:18 UTC - in response to Message 38090.  

It's very annoying that such faulty tasks are still around.
This error has been know for such long time now, so why was it not eliminated?

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4028&postid=35142#35142
In a nutshell, reproducability.


I guess what is explained in the posting cited by you:
... At least a partial consolation is that, as far as I know, you are at least still getting credits for them, even though I totally understand that it is frustrating that your CPU is basically idling during those jobs and not contributing to science. ...
Peter
deals with somehting else.
Because in the cases I am complaining about, there is ZERO credit :-(

I'll wager reproducability is the reason they tolerate looping sherpas as well as the reason they tolerate 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED.
ID: 38094 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,653,221
RAC: 15,903
Message 38234 - Posted: 13 Mar 2019, 7:58:47 UTC

ID: 38234 · Report as offensive     Reply Quote
Guiri-One[Andalucia]

Send message
Joined: 1 Feb 06
Posts: 66
Credit: 9,723
RAC: 0
Message 38235 - Posted: 13 Mar 2019, 8:46:28 UTC - in response to Message 38234.  

Because of all these, last weekend was last time I tried (with no succes) to run any VM LHC task.

I will stick to sixtrack, when available.

Such a pity.
ID: 38235 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,877,801
RAC: 121,705
Message 38258 - Posted: 17 Mar 2019, 13:57:20 UTC - in response to Message 38235.  

again a task failed after 9 1/2 hours:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=219309143

much annoying :-(((
ID: 38258 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,877,801
RAC: 121,705
Message 38480 - Posted: 28 Mar 2019, 20:52:49 UTC

by now I am really pissed off by this "disk_limit_exceeded" failure.

My CPU was working for more than 15 hours, and then this :-(((
https://lhcathome.cern.ch/lhcathome/result.php?resultid=220073182

why does this problem still exist? Can someone there get it rectified finally?
ID: 38480 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,877,801
RAC: 121,705
Message 38506 - Posted: 1 Apr 2019, 3:20:39 UTC

the next one:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=220284354
after 13:30 hrs CPU time :-(((

Can someone finally stop this nonsense, please !!!
ID: 38506 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 374
Credit: 238,712
RAC: 0
Message 38507 - Posted: 1 Apr 2019, 12:32:44 UTC - in response to Message 38506.  

Looking at the result it says that 18GB was used. This is quite high. The task seemed to finish correctly. My guess is the BOINC client is doing some operations after the task has ended and a threshold was reached. You may wish to check the status of your local slot/project directories.
ID: 38507 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,877,801
RAC: 121,705
Message 38508 - Posted: 1 Apr 2019, 18:26:29 UTC - in response to Message 38507.  

I now have looked up some other tasks that failed, and all showed a peak disk usage value of about 18.300 MB.
Whereas the non-failing tasks used about 1.000 MB (with 200 MB up or down).

Also, as suggested, I checked the local directories, but could not spot any space limitations there.
The disk size is 190GB, out of which about 160GB are free. The settings in BOINC are to allow usage up to 90% of the disk.

So, unless I am missing something, there does not seem to be any threshold reached within my settings, even with a task that uses 18GB.

However, what I am wondering is why it's exactly the failing tasks that use that much more disk space (i.e. 18 x more) in contrast to the others, non-failing ones, which use about 1 GB. Hence, I still think that there may be something wrong with these failing tasks.
ID: 38508 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Message boards : Theory Application : 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED - how come?


©2024 CERN