Message boards : Number crunching : error -177 resource limit exceeded
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
grumpy

Send message
Joined: 1 Sep 04
Posts: 57
Credit: 2,831,592
RAC: 53
Message 23125 - Posted: 19 Sep 2011, 0:59:37 UTC

Somehting is wrong
I got the same message!

I've got 100 gig of disk allocated to boinc.
18 gigs of memory, win 7 64

Can't be running out!

LHC is taking 53 meg of disk.
ID: 23125 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23126 - Posted: 19 Sep 2011, 4:44:00 UTC - in response to Message 23125.  

You aren't running out, marmaduke.

Each task you receive has a limit on how much disk space it is allowed to use. If that limit is exceeded the task will crash. You cannot fix this problem by allocating more disk space to BOINC. Only the project can fix the problem, by increasing the disk space limit they put in their tasks. You might want to set Sixtrack to No New Tasks until the admins increase the disk space limit.

Has bigmac been alerted?
ID: 23126 · Report as offensive     Reply Quote
Profile Igor Zacharov
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 16 May 11
Posts: 79
Credit: 111,419
RAC: 0
Message 23127 - Posted: 19 Sep 2011, 5:15:38 UTC - in response to Message 23124.  

The beam-beam interaction jobs which we run now are new studies.
They simulate influence of beam particles on each other and need large number
of simulated turns to unveil the effects. Of course, we have tested on our
computers before submitting, but we may not have corrected everything.

The problem with resource limits is serios and I will discuss with Eric McIntosh when he comes to CERN in the morning.

We will review the settings asap.
skype id: igor-zacharov
ID: 23127 · Report as offensive     Reply Quote
Speedy

Send message
Joined: 28 Jul 05
Posts: 37
Credit: 451,635
RAC: 20
Message 23128 - Posted: 19 Sep 2011, 5:29:31 UTC - in response to Message 23124.  

I found the problem I think in the init_data.ini file

<rsc_disk_bound>30000000.000000</rsc_disk_bound>, This is 28MB


Next 58% done is at 27.7 MB (29,048,832 bytes) 05:35:30 with 3 hours to go = i bet this would error, except i stopped boinc and manually bumbped up the limit and it got to 05:51:07 at 28.7 MB (30,146,560 bytes) and counting.

So this is the problem - The project needs to up the limit.

That's wonderful to hear you have manually fixed the problem. Can you post instructions on what to do to correct linit?

Have A Crunching Good day
ID: 23128 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23130 - Posted: 19 Sep 2011, 6:45:55 UTC - in response to Message 23128.  
Last modified: 19 Sep 2011, 6:49:38 UTC

You can fix the problem manually on your end but you have to do it for every Sixtrack task you receive. The only way to have ALL your tasks fixed is for the admin to fix the problem on the server. That's why I suggested just setting Sixtrack to No New Tasks until they fix it on the server.

If you want to fix the problem for the tasks you have now...

1. exit BOINC client

2. open client_state.xml in the BOINC data directory in a text editor, do NOT use Word or Wordpad, use Notepad

3. search the text for <rsc_disk which will find <rsc_disk_bound> for EVERY task you have for EVERY project you have, one after the other

4. you want to edit ONLY the instances of <rsc_disk_bound> for your Sixtrack tasks, those will be in a block of text like this and have sixtrack between the <app_name> and </app_name> tags...

<workunit>
 <name>w3_weak3_collision_err_bb__16__s__64.31_59.32__4_6__6__55.5_1_sixvf_boinc183751</name>
    <app_name>sixtrack</app_name>
    <version_num>53008</version_num>
    <rsc_fpops_est>30000000000000.000000</rsc_fpops_est>
    <rsc_fpops_bound>300000000000000.000000</rsc_fpops_bound>
    <rsc_memory_bound>100000000.000000</rsc_memory_bound>
    <rsc_disk_bound>30000000.000000</rsc_disk_bound>
    <file_ref>
        <file_name>w3_weak3_collision_err_bb__16__s__64.31_59.32__4_6__6__55.5_1_sixvf_boinc183751.zip</file_name>
        <open_name>fort.zip</open_name>
    </file_ref>
</workunit>

5. When you find a block like the one above it will have 30000000.000000 between the <rsc_disk_bound> and </rsc_disk_bound> tags. That's the number you need to change. Multiply that number by 10 by adding another 0 to the left of the decimal point.

6. Now find the next block that has Sixtrack for the app_name and 30000000.000000 for the rsc_disk_bound. Add another 0 there, repeat until you find and edit all such blocks.

7. Save the file, exit Notepad, restart BOINC client.

You have to do that for every Sixtrack task you receive or there is a chance they'll crash. I've done that for tasks I've already started but I'm sure as heck not gonna keep doing it. I've set Sixtrack to NNT until they fix this server side.
ID: 23130 · Report as offensive     Reply Quote
Speedy

Send message
Joined: 28 Jul 05
Posts: 37
Credit: 451,635
RAC: 20
Message 23131 - Posted: 19 Sep 2011, 8:06:30 UTC

Thanks for steps. I had a look & decided not to modify anything. Will let my 98 remaining tasks run (12 at a time) & hope they don't all error out. Thanks again

Have A Crunching Good day
ID: 23131 · Report as offensive     Reply Quote
Profile Igor Zacharov
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 16 May 11
Posts: 79
Credit: 111,419
RAC: 0
Message 23132 - Posted: 19 Sep 2011, 9:00:15 UTC - in response to Message 23131.  

I have put the value of 90MB into the data base for all work units and
restarted the boinc server. Also, I have changed the distribution rate:
5
3

(was 10 before) such that not too many are sitting on a single machine.
The daily quota is still at 40, so that nobody should be short of work.

I suggest, if you abort the jobs which are waiting or consummed little
time you will get new jobs with corrected disk size.

Please, report any other problems you see. Thank you.
skype id: igor-zacharov
ID: 23132 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23133 - Posted: 19 Sep 2011, 9:14:42 UTC

Thanks Igor.

90 should do it.

I'm up to one at about 90%, 8:44:28 hours with 53 minutes to go and it is only 42.5 MB (44,625,920 bytes).

Thanks for the fix.

---

volunterrs should not modify client_state.xml unless you are very careful, one mistake and your client could trash all your work and settings. It is not a file you should mess with normally. Also if you do, that setting is only good for the tasks you modify, every time the cleint gets new work, it will have project supplied settings. Make a backup first !

I'm an alpha tester and do things like this to find bugs, loss of work is not a concern to me.
ID: 23133 · Report as offensive     Reply Quote
superempie

Send message
Joined: 28 Jul 05
Posts: 24
Credit: 6,603,623
RAC: 0
Message 23135 - Posted: 19 Sep 2011, 10:48:50 UTC

After a few wu's errored out with the disk space issue, I decided to halt this project and abort all wu's running until the problem is solved.

Could you leave a message on the front page when it is fixed?
ID: 23135 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23136 - Posted: 19 Sep 2011, 14:02:41 UTC - in response to Message 23135.  
Last modified: 19 Sep 2011, 14:03:58 UTC

After a few wu's errored out with the disk space issue, I decided to halt this project and abort all wu's running until the problem is solved.

Could you leave a message on the front page when it is fixed?

It was fixed, two posts before yours.

Unfortuantely they cannot fix work you already have.

Reenable and you should get good work what will not produce this error.
ID: 23136 · Report as offensive     Reply Quote
superempie

Send message
Joined: 28 Jul 05
Posts: 24
Credit: 6,603,623
RAC: 0
Message 23138 - Posted: 19 Sep 2011, 14:42:01 UTC - in response to Message 23136.  

Thanks, will do.

ID: 23138 · Report as offensive     Reply Quote
Michael Becker

Send message
Joined: 15 Jul 05
Posts: 8
Credit: 1,036,470
RAC: 0
Message 23139 - Posted: 19 Sep 2011, 15:53:00 UTC
Last modified: 19 Sep 2011, 16:05:09 UTC

hi there,
i dont know whats up, i get a lot of errors:
    183921 83243 18 Sep 2011 13:57:14 UTC 18 Sep 2011 20:37:33 UTC Error while computing 22,612.19 22,501.96 101.58 --- SixTrack v530.08
    183913 83239 18 Sep 2011 13:54:25 UTC 18 Sep 2011 19:58:50 UTC Error while computing 18,423.98 18,332.83 82.76 --- SixTrack v530.08
    183908 83236 18 Sep 2011 13:57:14 UTC 19 Sep 2011 4:45:09 UTC Error while computing 20,420.03 20,300.75 91.64 --- SixTrack v530.08
    183904 83234 18 Sep 2011 13:57:14 UTC 19 Sep 2011 9:16:26 UTC Error while computing 20,120.55 20,022.49 90.39 --- SixTrack v530.08
    183900 83232 18 Sep 2011 13:54:25 UTC 18 Sep 2011 20:10:05 UTC Error while computing 21,728.14 21,576.06 97.40 --- SixTrack v530.08
    183852 83208 18 Sep 2011 13:57:14 UTC 19 Sep 2011 11:16:51 UTC Error while computing 20,050.06 19,964.62 90.12 --- SixTrack v530.08
    183816 83190 18 Sep 2011 13:54:25 UTC 18 Sep 2011 19:58:50 UTC Error while computing 18,417.01 18,294.77 82.59 --- SixTrack v530.08
    183707 83136 18 Sep 2011 13:57:14 UTC 19 Sep 2011 15:21:06 UTC Error while computing 19,455.11 19,388.66 87.52 --- SixTrack v530.08

http://lhcathomeclassic.cern.ch/sixtrack/results.php?userid=7259&offset=0&show_names=0&state=5
a long time no work, then de/reattach, get some work but a lot cpu-time for nothing

EDIT:
found the information - - - 'Maximum disk usage exceeded'
thought 10GB was enough, now changed to 25GB, hope it helps

ID: 23139 · Report as offensive     Reply Quote
Michael Becker

Send message
Joined: 15 Jul 05
Posts: 8
Credit: 1,036,470
RAC: 0
Message 23140 - Posted: 19 Sep 2011, 17:22:55 UTC - in response to Message 23139.  

need help, please

found the information - - - 'Maximum disk usage exceeded'
thought 10GB was enough, now changed to 25GB, hope it helps


i cant beleave that 'Maximum disk usage exceeded' couses the problem.
the next wu finished with error while computing
5mins prior i see 95MB hard disk usage by lhc, and 23GB free for boinc applications.
now only three lhc tasks ar running and i have 70MB disk usage by lhc

anny idea ?
ID: 23140 · Report as offensive     Reply Quote
Filipe

Send message
Joined: 9 Aug 05
Posts: 36
Credit: 7,693,055
RAC: 146
Message 23141 - Posted: 19 Sep 2011, 17:31:32 UTC
Last modified: 19 Sep 2011, 17:31:45 UTC

Please, see the thread bellow:

error -177 resource limit exceeded

You question are already been answered.

Filipe.
ID: 23141 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23142 - Posted: 19 Sep 2011, 17:33:06 UTC - in response to Message 23140.  
Last modified: 19 Sep 2011, 17:33:56 UTC

This was caused by a problem on the server, not a problem on your end. The problem has been fixed. Now you should detach and reattach to get fresh tasks.
ID: 23142 · Report as offensive     Reply Quote
Profile Tom95134

Send message
Joined: 4 May 07
Posts: 250
Credit: 826,541
RAC: 0
Message 23144 - Posted: 19 Sep 2011, 18:05:42 UTC - in response to Message 23141.  

Please, see the thread bellow:

error -177 resource limit exceeded

You question are already been answered.

Filipe.

Bad link.
ID: 23144 · Report as offensive     Reply Quote
Michael Becker

Send message
Joined: 15 Jul 05
Posts: 8
Credit: 1,036,470
RAC: 0
Message 23145 - Posted: 19 Sep 2011, 18:15:29 UTC - in response to Message 23144.  

Please, see the thread bellow:

error -177 resource limit exceeded


thank you so much
i make the changes in the file, so the last tasks can be finished normaly
ID: 23145 · Report as offensive     Reply Quote
Profile microchip
Avatar

Send message
Joined: 27 Jun 06
Posts: 6
Credit: 1,577,243
RAC: 4,260
Message 23410 - Posted: 8 Oct 2011, 10:44:20 UTC

I get a similar, albeit not the same error




<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
<stderr_txt>

</stderr_txt>
]]>

ID: 23410 · Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 17 Jul 05
Posts: 102
Credit: 542,016
RAC: 0
Message 23506 - Posted: 14 Oct 2011, 16:33:56 UTC - in response to Message 23410.  
Last modified: 14 Oct 2011, 16:35:26 UTC

...
Maximum elapsed time exceeded
...

Might be a configuration error on project side, each result has a value "rsc_fpops_bound" (or similar), where the project guys can configure when a result is aborted. This is thought to avoid endless loops / iterations that never reach their desired target value but it should not knock out a result that still is working properly.

Afaik. the benchmark results influence the value that is compared against this rsc_fpops_bound but your benchmark values do not look unusually high.

One possible cause on client side would be a power saving mode, where the host runs at a reduced clock speed.

Dynamic turbo mode on some CPU cores could have a similar effect, if the benchmark has been carried out on a higher clocked core. I have read somewhere that AMD has something like that on later Bulldozer chips but I'm not sure what this exactly means for BOINC.
ID: 23506 · Report as offensive     Reply Quote
Fred J. Verster
Avatar

Send message
Joined: 4 Aug 08
Posts: 14
Credit: 278,575
RAC: 0
Message 23746 - Posted: 27 Nov 2011, 14:17:53 UTC - in response to Message 23506.  

...
Maximum elapsed time exceeded
...

Might be a configuration error on project side, each result has a value "rsc_fpops_bound" (or similar), where the project guys can configure when a result is aborted. This is thought to avoid endless loops / iterations that never reach their desired target value but it should not knock out a result that still is working properly.

Afaik. the benchmark results influence the value that is compared against this rsc_fpops_bound but your benchmark values do not look unusually high.

One possible cause on client side would be a power saving mode, where the host runs at a reduced clock speed.

Dynamic turbo mode on some CPU cores could have a similar effect, if the benchmark has been carried out on a higher clocked core. I have read somewhere that AMD has something like that on later Bulldozer chips but I'm not sure what this exactly means for BOINC.



Isn't a -177 error a timing error? I remember this happening at SETI@home, when using a GPU.
I also run Rosetta and was amazed by the >700MByte WU RAM use.
This host, now only does CPU jobs, a.t.m. (It has 2 EAH5870 GPU's)
Sandy Bridge CPU's, use a dynamic turbo, too, if CPU load is low, clock frquency
goes down.


Knight Who Says Ni
N!
ID: 23746 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : error -177 resource limit exceeded


©2024 CERN