1) Message boards : Number crunching : error -177 resource limit exceeded (Message 23120)
Posted 18 Sep 2011 by Profile biancaw
Post:
I bet you guys with 60 seconds spend more time writing checkpoints files than computing. This problem crops up at malariacontrol which has huge checkpoint files.

i'd rather loose upto 15 minutes of work, as I don't reboot all that often than waste half the day wearing out my disk drive.

My guess is from my 12MB for 4 hours it is generating 1MB per checkpoint, and if you write 60MB per hour that is 300MB for 5 hours per task. This might be your problem. Especially if your client pauese a lot of tasks to start new ones. Also this multiplies by every project you run.

Try either 300 for 5 minutes or 900 for 15 minutes. I think in the long run you'll find overall better performance with not that much loss, unless your client regularly crashes or you reboot multiple times a day.

Report back if that helps, then we will know.


I have no problem with 300MB there is enough space for 12 GB / task.

have looked at the crashed wus and what my wingman do
they are in progress, unsent or also crashed non of them ar valid.
2) Message boards : Number crunching : error -177 resource limit exceeded (Message 23117)
Posted 18 Sep 2011 by Profile biancaw
Post:
Just a thought, do you guys have your "write to disk" interval at some small number, maybe it is checkpointing way too often and creating a large file.

I use 900 seconds which is 15 minutes.



BoincTasks says, i have 1 checkpoint / minute.

It works for 91 projects, if its not working for LHC its a project problem.
3) Message boards : Number crunching : error -177 resource limit exceeded (Message 23114)
Posted 18 Sep 2011 by Profile biancaw
Post:
Client state Compute error
Exit status -177 (0xffffffffffffff4f)


This is a client error, not project error.

.......

You need to check the boinc manager disk tab, how much space is available to boinc and how much is free ? How much is showing using for LHC ? maybe it is another project taking up all the space ?

I'm running 3 LHC 1.0 on one machine, all at 5 hours (51%) so far and the total used by LHC is 65MB. On a second machine it is running 8 at about 1 hour so far and only used 55MB. I've got a lot of projects attached and only used 5GB and still 200GB free for boinc.


I have the same error.
used disc space for Boinc = 932.15 MB
free disc space available for Boinc = 99.06 GB



©2024 CERN