Message boards : Theory Application : 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED - how come?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,374,896
RAC: 102,120
Message 37403 - Posted: 22 Nov 2018, 17:55:18 UTC

This task

https://lhcathome.cern.ch/lhcathome/result.php?resultid=208859885

errored out after about 5 hours with 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED.

The report shows a Peak disk usage of 18,312.37 MB which is totally unusual; I checked other tasks, they had about 1 GB.
But still the BOINC settings should have allowed a even higher disk usage: 90% of 195GB are for use with BOINC.

Does anyone have an explanation what happened?
ID: 37403 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,740,364
RAC: 233,427
Message 37404 - Posted: 22 Nov 2018, 18:23:21 UTC - in response to Message 37403.  

The project team set the limit wrong when they submitted the WU's, you have to just let them die they will fix themselves over time
ID: 37404 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 37405 - Posted: 22 Nov 2018, 18:35:35 UTC - in response to Message 37403.  

But still the BOINC settings should have allowed a even higher disk usage: 90% of 195GB are for use with BOINC.
There are 2 "disk limits". You are confusing the two.
That "90% of 195 GB" is the limit for the total disk usage by all BOINC tasks. This limit is user configurable in BOINC manager.

The 196 error refers to the disk limit placed on a single task. This limit is NOT user configurable, it is NOT the disk limit referred to above. This limit is set by the server.
ID: 37405 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 37410 - Posted: 23 Nov 2018, 8:42:05 UTC - in response to Message 37403.  

This task

https://lhcathome.cern.ch/lhcathome/result.php?resultid=208859885

errored out after about 5 hours with 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED.

The report shows a Peak disk usage of 18,312.37 MB which is totally unusual; I checked other tasks, they had about 1 GB.
But still the BOINC settings should have allowed a even higher disk usage: 90% of 195GB are for use with BOINC.

Does anyone have an explanation what happened?

My explanation:

The working slot (incl. subdirs) of a Theory task may not exceed 8000000000 bytes eqs 7629.39453125 MB.
Somehow the size of all files together grew abnormal. The task did 1 job and was not suspended creating snapshots,
I suppose Vbox.log or VBoxHardening.log is responsible for the enormous size, maybe loop writing an error situation.
ID: 37410 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,374,896
RAC: 102,120
Message 37566 - Posted: 8 Dec 2018, 12:18:43 UTC

This morning, again a task errored out, after 4 1/2 hours, with the exit disk limit failure: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211563104

What's wrong this time?
ID: 37566 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 37567 - Posted: 8 Dec 2018, 13:08:00 UTC - in response to Message 37566.  

This morning, again a task errored out, after 4 1/2 hours, with the exit disk limit failure: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211563104

What's wrong this time?

2018-12-08 11:51:03 (1464): Guest Log: [INFO] Job finished in slot1 with 1.

2018-12-08 11:51:08 (1464): Guest Log: [INFO] New Job Starting in slot1

2018-12-08 11:51:08 (1464): Guest Log: [INFO] Condor JobID:  482880.13 in slot1

2018-12-08 11:51:08 (1464): Guest Log: [INFO] Job finished in slot1 with 2.

Strange that the Job finished within a second even without issuing a MCPlots ID.
Finished with 2 normally means something like 'file not found'.
What's about that time in BOINC's event log?
ID: 37567 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,374,896
RAC: 102,120
Message 37568 - Posted: 8 Dec 2018, 13:45:27 UTC - in response to Message 37567.  

What's about that time in BOINC's event log?
nothing particular
ID: 37568 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 674
Credit: 43,151,966
RAC: 15,770
Message 37576 - Posted: 9 Dec 2018, 11:55:29 UTC

I've got one of these errors during last night, here: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211545734
It ran 18 hours and 5 minutes and was propably just about to end when the error hit. Peak disk usage was 16,501.43 MB.
ID: 37576 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,374,896
RAC: 102,120
Message 37616 - Posted: 16 Dec 2018, 8:55:19 UTC

same problem happened yesterday:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=212297479

anyone any idea why so?
ID: 37616 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,374,896
RAC: 102,120
Message 37673 - Posted: 21 Dec 2018, 13:34:46 UTC - in response to Message 37616.  

The next 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED error, after about 7 1/2 hours runtime:

Anyone any explanation how come?
ID: 37673 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,935,019
RAC: 137,648
Message 37674 - Posted: 21 Dec 2018, 13:46:25 UTC - in response to Message 37673.  

Most likely misbehaving sherpa jobs that create huge logs until the disk limit is reached.
They are hard to debug as the logs are lost as soon as the VM is shut down.

See the discussion at LHC-dev:
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=438
ID: 37674 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 37680 - Posted: 22 Dec 2018, 12:20:09 UTC

LHC@home 22 Dec 12:00:39 UTC Aborting task Theory_1543818_1545461975.206982_0: exceeded disk limit: 9596.10MB > 7629.39MB

Task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=212747277
ID: 37680 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,374,896
RAC: 102,120
Message 37693 - Posted: 25 Dec 2018, 10:54:20 UTC

I had two more yesterday:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=212811309
https://lhcathome.cern.ch/lhcathome/result.php?resultid=212826747

and one the day before yesterday:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=212743711

they failed between 13 and 18 hours processing time, which is a shame :-(
ID: 37693 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,374,896
RAC: 102,120
Message 37696 - Posted: 28 Dec 2018, 6:16:17 UTC

Can anyone from the LHC@home people please explain why there are that many tasks lately erroring out with "196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED"

latest case from one of my PCs: https://lhcathome.cern.ch/lhcathome/result.php?resultid=212965047
again, 6 1/2 hours of CPU time for nothing :-(
ID: 37696 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,112,959
RAC: 104,330
Message 37697 - Posted: 28 Dec 2018, 7:09:29 UTC - in response to Message 37674.  

Most likely misbehaving sherpa jobs that create huge logs until the disk limit is reached.
They are hard to debug as the logs are lost as soon as the VM is shut down.

See the discussion at LHC-dev:
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=438
ID: 37697 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 1
Message 37698 - Posted: 28 Dec 2018, 17:56:40 UTC

Got one here,11hrs+ in, with running.log at 4.4GB
In Console window, stuff is whizzing past too fast to read.
I'm going to reset the VM so as not to waste any more time on it as it's likely to grow too big and fail but hopefully the identification details captured below will be helpful.

From stdout.log
06:31:00 +0000 2018-12-28 [INFO] Condor JobID: 484579.129 in slot1
06:31:06 +0000 2018-12-28 [INFO] MCPlots JobID: 47878534 in slot1

Top line of running.log
===> [runRivet] Fri Dec 28 06:31:01 GMT 2018 [boinc pp jets 8000 180,-,3560 - sherpa 2.2.0 default 100000 8]
ID: 37698 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,374,896
RAC: 102,120
Message 37699 - Posted: 28 Dec 2018, 20:16:56 UTC

this specific problem has been occuring in the recent past only, as far as I could see.
So something must have been altered with these tasks at LHC@home.
ID: 37699 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 37701 - Posted: 29 Dec 2018, 7:42:07 UTC - in response to Message 37698.  
Last modified: 29 Dec 2018, 7:46:57 UTC

Got another one too. Could not watch the running job, but most probably a sherpa.

Stderr output: 2018-12-28 19:02:34 UTC (4528): Guest Log: [INFO] New Job Starting in slot1

BOINC event: 2018-12-29 05:02:23 UTC Aborting task Theory_1102016_1545999587.637140_0: exceeded disk limit: 8970.10MB > 7629.39MB

https://lhcathome.cern.ch/lhcathome/result.php?resultid=213019449
ID: 37701 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,374,896
RAC: 102,120
Message 37706 - Posted: 29 Dec 2018, 9:44:13 UTC - in response to Message 37701.  
Last modified: 29 Dec 2018, 9:44:25 UTC

Got another one too.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=213019449
really annoying if this happens after so many hours :-(
Total waste of CPU time.
ID: 37706 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,935,019
RAC: 137,648
Message 37925 - Posted: 5 Feb 2019, 5:53:48 UTC

This is still an issue on the volunteer's side.
I wonder if it is under investigation on the project's side.

Could anyone from the project team be so kind as to give a short summary?
ID: 37925 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : Theory Application : 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED - how come?


©2024 CERN