Message boards : Number crunching : error -177 resource limit exceeded
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Filipe

Send message
Joined: 9 Aug 05
Posts: 35
Credit: 6,657,432
RAC: 193
Message 23101 - Posted: 18 Sep 2011, 18:53:20 UTC

Begining today with the long WU > 1 hour,

After several hours of computation they simply report a computational error. And that, for several different computers.

No overclocking here.

Anu ideia what´s happening?
ID: 23101 · Report as offensive     Reply Quote
reklov

Send message
Joined: 20 Feb 10
Posts: 2
Credit: 171,070
RAC: 0
Message 23102 - Posted: 18 Sep 2011, 19:12:56 UTC

<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7733884E

Engaging BOINC Windows Runtime Debugger...


ID: 23102 · Report as offensive     Reply Quote
Filipe

Send message
Joined: 9 Aug 05
Posts: 35
Credit: 6,657,432
RAC: 193
Message 23104 - Posted: 18 Sep 2011, 19:34:09 UTC

So, if i allow boinc to use more disc space, i'll be fine?
ID: 23104 · Report as offensive     Reply Quote
Profile Byron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 1 Sep 04
Posts: 44
Credit: 765,348
RAC: 31
Message 23105 - Posted: 18 Sep 2011, 19:48:46 UTC
Last modified: 18 Sep 2011, 19:59:12 UTC

Hello,


just reporting on my computer the following:



Name w3_weak3_collision_err_bb__2__s__64.31_59.32__8_10__6__69_1_sixvf_boinc178922_0
Workunit 82562
Created 18 Sep 2011 11:43:00 UTC
Sent 18 Sep 2011 12:20:21 UTC
Received 18 Sep 2011 18:13:48 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -177 (0xffffffffffffff4f)
Computer ID 9929916
Report deadline 25 Sep 2011 3:52:35 UTC
Run time 21078.433136
CPU time 20669.91


<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x75213E2E

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 6.11.7


Dump Timestamp    : 09/18/11 11:12:11
Install Directory : C:\Program Files\BOINC\
Data Directory    : C:\ProgramData\BOINC
Project Symstore  : 
Loaded Library    : C:\Program Files\BOINC\\dbghelp.dll
Loaded Library    : C:\Program Files\BOINC\\symsrv.dll
Loaded Library    : C:\Program Files\BOINC\\srcsrv.dll
LoadLibraryA( C:\Program Files\BOINC\\version.dll ): GetLastError = 126
Loaded Library    : version.dll
Debugger Engine   : 4.0.5.0
Symbol Search Path: C:\ProgramData\BOINC\slots\9;C:\ProgramData\BOINC\projects\lhcathomeclassic.cern.ch_sixtrack


ModLoad: 00400000 04c64000 C:\ProgramData\BOINC\projects\lhcathomeclassic.cern.ch_sixtrack\sixtrack_530.8_windows_intelx86.exe (-nosymbols- Symbols Loaded)
    Linked PDB Filename   : 

ModLoad: 76f90000 0013c000 C:\Windows\SYSTEM32\ntdll.dll (6.1.7601.17514) (-exported- Symbols Loaded)
    Linked PDB Filename   : ntdll.pdb
    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7600.16385

ModLoad: 75680000 000d4000 C:\Windows\system32\kernel32.dll (6.1.7601.17651) (-exported- Symbols Loaded)
    Linked PDB Filename   : kernel32.pdb
    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7600.16385

ModLoad: 751e0000 0004a000 C:\Windows\system32\KERNELBASE.dll (6.1.7601.17651) (-exported- Symbols Loaded)
    Linked PDB Filename   : kernelbase.pdb
    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7600.16385

ModLoad: 759a0000 000c9000 C:\Windows\system32\USER32.dll (6.1.7601.17514) (-exported- Symbols Loaded)
    Linked PDB Filename   : user32.pdb
    File Version          : 6.1.7601.17514 (win7sp1_rtm.101119-1850)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7601.17514

ModLoad: 75450000 0004e000 C:\Windows\system32\GDI32.dll (6.1.7601.17514) (-exported- Symbols Loaded)
    Linked PDB Filename   : gdi32.pdb
    File Version          : 6.1.7601.17514 (win7sp1_rtm.101119-1850)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7601.17514

ModLoad: 75440000 0000a000 C:\Windows\system32\LPK.dll (6.1.7600.16385) (-exported- Symbols Loaded)
    Linked PDB Filename   : lpk.pdb
    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7600.16385

ModLoad: 770d0000 0009d000 C:\Windows\system32\USP10.dll (1.626.7601.17514) (-exported- Symbols Loaded)
    Linked PDB Filename   : usp10.pdb
    File Version          : 1.0626.7601.17514 (win7sp1_rtm.101119-1850)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft(R) Uniscribe Unicode script processor
    Product Version       : 1.0626.7601.17514

ModLoad: 75cd0000 000ac000 C:\Windows\system32\msvcrt.dll (7.0.7600.16385) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcrt.pdb
    File Version          : 7.0.7600.16385 (win7_rtm.090713-1255)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 7.0.7600.16385

ModLoad: 75780000 000a0000 C:\Windows\system32\ADVAPI32.dll (6.1.7601.17514) (-exported- Symbols Loaded)
    Linked PDB Filename   : advapi32.pdb
    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7600.16385

ModLoad: 75760000 00019000 C:\Windows\SYSTEM32\sechost.dll (6.1.7600.16385) (-exported- Symbols Loaded)
    Linked PDB Filename   : sechost.pdb
    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7600.16385

ModLoad: 76d80000 000a1000 C:\Windows\system32\RPCRT4.dll (6.1.7601.17514) (-exported- Symbols Loaded)
    Linked PDB Filename   : rpcrt4.pdb
    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7600.16385

ModLoad: 754a0000 0002a000 C:\Windows\system32\imagehlp.dll (6.1.7601.17514) (-exported- Symbols Loaded)
    Linked PDB Filename   : imagehlp.pdb
    File Version          : 6.1.7601.17514 (win7sp1_rtm.101119-1850)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7601.17514

ModLoad: 75cb0000 0001f000 C:\Windows\system32\IMM32.DLL (6.1.7601.17514) (-exported- Symbols Loaded)
    Linked PDB Filename   : imm32.pdb
    File Version          : 6.1.7601.17514 (win7sp1_rtm.101119-1850)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7601.17514

ModLoad: 75870000 000cc000 C:\Windows\system32\MSCTF.dll (6.1.7600.16385) (-exported- Symbols Loaded)
    Linked PDB Filename   : msctf.pdb
    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7600.16385

ModLoad: 65690000 00115000 C:\Program Files\BOINC\dbghelp.dll (6.8.4.0) (-exported- Symbols Loaded)
    Linked PDB Filename   : dbghelp.pdb
    File Version          : 6.8.0004.0 (debuggers(dbg).070515-1751)
    Company Name          : Microsoft Corporation
    Product Name          : Debugging Tools for Windows(R)
    Product Version       : 6.8.0004.0

ModLoad: 64f40000 00048000 C:\Program Files\BOINC\symsrv.dll (6.8.4.0) (-exported- Symbols Loaded)
    Linked PDB Filename   : symsrv.pdb
    File Version          : 6.8.0004.0 (debuggers(dbg).070515-1751)
    Company Name          : Microsoft Corporation
    Product Name          : Debugging Tools for Windows(R)
    Product Version       : 6.8.0004.0

ModLoad: 675a0000 0003b000 C:\Program Files\BOINC\srcsrv.dll (6.8.4.0) (-exported- Symbols Loaded)
    Linked PDB Filename   : srcsrv.pdb
    File Version          : 6.8.0004.0 (debuggers(dbg).070515-1751)
    Company Name          : Microsoft Corporation
    Product Name          : Debugging Tools for Windows(R)
    Product Version       : 6.8.0004.0

ModLoad: 745c0000 00009000 C:\Windows\system32\version.dll (6.1.7600.16385) (-exported- Symbols Loaded)
    Linked PDB Filename   : version.pdb
    File Version          : 6.1.7600.16385 (win7_rtm.090713-1255)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 6.1.7600.16385



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 0, Write: 0, Other 0

- I/O Transfers Counters -
Read: 0, Write: 0, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 0, QuotaPeakPagedPoolUsage: 0
QuotaNonPagedPoolUsage: 0, QuotaPeakNonPagedPoolUsage: 0

- Virtual Memory Usage -
VirtualSize: 0, PeakVirtualSize: 0

- Pagefile Usage -
PagefileUsage: 0, PeakPagefileUsage: 0

- Working Set Size -
WorkingSetSize: 0, PeakWorkingSetSize: 0, PageFaultCount: 0

*** Dump of thread ID 3980 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x75213E2E

- Registers -
eax=00000000 ebx=00000001 ecx=0060b3c6 edx=0656613c esi=00000001 edi=00000000
eip=75213e2e esp=0656fa68 ebp=0656ff94
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

- Callstack -
ChildEBP RetAddr  Args to Child
0656ff94 76ff37f5 00000000 65ee29fc 00000000 00000000 KERNELBASE!DebugBreak+0x0 
0656ffd4 76ff37c8 0053f660 00000000 00000000 00000000 ntdll!RtlInitializeExceptionChain+0x0 
0656ffec 00000000 0053f660 00000000 00000000 4c9d5ab9 ntdll!RtlInitializeExceptionChain+0x0 

*** Dump of thread ID 4776 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Registers -
eax=0000001c ebx=000e1a77 ecx=000001b0 edx=0000001f esi=00003085 edi=000014b7
eip=004cf9c5 esp=0012be40 ebp=0012c4e8
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000203

- Callstack -
ChildEBP RetAddr  Args to Child
0012c4e8 004b47f3 00000050 00000002 00000050 00000008 sixtrack_530.8_windows_intelx86!+0x0 
0012c724 0040a7fb 0012ced4 00000000 00000000 00000000 sixtrack_530.8_windows_intelx86!+0x0 
0012fdf8 0040101f 00000000 0012fe44 0012ff40 00252e58 sixtrack_530.8_windows_intelx86!+0x0 
0012fe78 005ea093 0012ff38 00000000 00001db1 00000002 sixtrack_530.8_windows_intelx86!+0x0 
0012fef8 005c684a 005f42b4 756cfcdd 005f25a0 0012fef8 sixtrack_530.8_windows_intelx86!+0x0 
0012ff88 756ced6c 7ffd3000 0012ffd4 76ff37f5 7ffd3000 sixtrack_530.8_windows_intelx86!+0x0 
0012ff94 76ff37f5 7ffd3000 63aa29fc 00000000 00000000 kernel32!BaseThreadInitThunk+0x0 
0012ffd4 76ff37c8 005c68a1 7ffd3000 00000000 00000000 ntdll!RtlInitializeExceptionChain+0x0 
0012ffec 00000000 005c68a1 7ffd3000 00000000 78746341 ntdll!RtlInitializeExceptionChain+0x0 


*** Debug Message Dump ****


*** Foreground Window Data ***
    Window Name      : 
    Window Class     : 
    Window Process ID: 0
    Window Thread ID : 0

Exiting...

</stderr_txt>
]]>
ID: 23105 · Report as offensive     Reply Quote
Profile [SG-FC] dingdong

Send message
Joined: 12 Sep 08
Posts: 1
Credit: 386,528
RAC: 0
Message 23107 - Posted: 18 Sep 2011, 20:20:11 UTC

Got a ressource limit failure
http://lhcathomeclassic.cern.ch/sixtrack/result.php?resultid=183173 after 5h runtime.

The preferences are:

Use at most 500 GB disk space
Leave at least 1 GB disk space free
Use at most 99% of total disk space
Use at most 90% of page file (swap space)
Use at most 95% of memory when computer is in use
Use at most 100% of memory when computer is not in use

What happend?
ID: 23107 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23108 - Posted: 18 Sep 2011, 20:27:50 UTC - in response to Message 23105.  
Last modified: 18 Sep 2011, 21:00:07 UTC

Client state Compute error
Exit status -177 (0xffffffffffffff4f)


This is a client error, not project error.

See bonic wiki for error codes at http://www.boinc-wiki.info/Error_Code

-177 = ERR_RSC_LIMIT_EXCEEDED = resource limit exceeded

This means you need to allocate more resource space, in this case disk space. Either adjust/raise your preferences or you need to delete some junk to free up space. Detach old projects that aren't being used but taking up space.

You need to check the boinc manager disk tab, how much space is available to boinc and how much is free ? How much is showing using for LHC ? maybe it is another project taking up all the space ?

I'm running 3 LHC 1.0 on one machine, all at 5 hours (51%) so far and the total used by LHC is 65MB. On a second machine it is running 8 at about 1 hour so far and only used 55MB. I've got a lot of projects attached and only used 5GB and still 200GB free for boinc.
ID: 23108 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23109 - Posted: 18 Sep 2011, 20:47:16 UTC - in response to Message 23107.  

Maybe you have less than 1 GB free on your drive?

If you run other BOINC projects they would be crashing for the same reason if you have less than 1 GB free.
ID: 23109 · Report as offensive     Reply Quote
reklov

Send message
Joined: 20 Feb 10
Posts: 2
Credit: 171,070
RAC: 0
Message 23110 - Posted: 18 Sep 2011, 20:48:07 UTC - in response to Message 23108.  

I had 2 wu's errored out with disk limit. Without changing my settings or pojects I now have 17GB of disk space available for BOINC. I don't think that 2 wu's of LHC are entitled to occupy 17GB.
ID: 23110 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23111 - Posted: 18 Sep 2011, 21:01:32 UTC - in response to Message 23110.  

I had 2 wu's errored out with disk limit. Without changing my settings or pojects I now have 17GB of disk space available for BOINC. I don't think that 2 wu's of LHC are entitled to occupy 17GB.


But you also have the "Leave at least X GB free" rule. Do you have at least X GB free?
ID: 23111 · Report as offensive     Reply Quote
Profile Byron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 1 Sep 04
Posts: 44
Credit: 765,348
RAC: 31
Message 23113 - Posted: 18 Sep 2011, 21:07:26 UTC - in response to Message 23108.  

Client state Compute error
Exit status -177 (0xffffffffffffff4f)

This is a client error, not project error.
See bonic wiki for error codes at http://www.boinc-wiki.info/Error_Code

-177 = ERR_RSC_LIMIT_EXCEEDED = resource limit exceeded

This means you need to allocate more resource space, in this case disk space. Either adjsut/raise your preferences or you need to delete some junk to free up space. Detach old projects that arn't being used but taking up space.

Hmm .... that's strange

because I have always told BOINC to use all the memory and Disk it wants on my computer

and this my new DELL WorkStation computer ... with tons of memory and disk space.

My Computing preferences: for SixTrack and Test4Theory:

Disk and memory usage:

Use at most 542 GB disk space

Leave at least:

0.001 GB disk space free

(Values smaller than 0.001 are ignored)

Use at most 100% of total disk space

Tasks checkpoint to disk at most every 60 seconds

Use at most 100% of page file (swap space)

Use at most 100% of memory when computer is in use

Use at most 100% of memory when computer is not in use


my computer

Number of processors 8
Coprocessors NVIDIA Quadro 2000 (961MB) driver: 27536
Operating System Microsoft Windows 7
Ultimate x86 Edition, Service Pack 1, (06.01.7601.00)
BOINC client version 6.12.34
Memory 3325.58 MB
Cache 256 KB
Swap space 6649.45 MB
Total disk space 148.24 GB
Free Disk Space 113.99 GB


Hmmm ... what else could be wrong?
ID: 23113 · Report as offensive     Reply Quote
Profile biancaw

Send message
Joined: 21 Feb 08
Posts: 3
Credit: 292,398
RAC: 0
Message 23114 - Posted: 18 Sep 2011, 21:16:09 UTC - in response to Message 23108.  
Last modified: 18 Sep 2011, 21:23:34 UTC

Client state Compute error
Exit status -177 (0xffffffffffffff4f)


This is a client error, not project error.

.......

You need to check the boinc manager disk tab, how much space is available to boinc and how much is free ? How much is showing using for LHC ? maybe it is another project taking up all the space ?

I'm running 3 LHC 1.0 on one machine, all at 5 hours (51%) so far and the total used by LHC is 65MB. On a second machine it is running 8 at about 1 hour so far and only used 55MB. I've got a lot of projects attached and only used 5GB and still 200GB free for boinc.


I have the same error.
used disc space for Boinc = 932.15 MB
free disc space available for Boinc = 99.06 GB
ID: 23114 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23115 - Posted: 18 Sep 2011, 21:18:29 UTC

Check the disk tab and report how much space each project shows using.

I do see T4T is using 2.2G on my system only running one task only 50% done, and that is more than CPDN which is running 4 tasks using 1.72GB. Those are the largest two. The other 60 or so projects only use small amount of MB each.

Still on the other system LHC is only showing up to 72MB, two tasks are at 60% for 6 hours runtime, the other at 98% for 4 hours. the 4 hour finished and it dropped to 60MB so it was using about 12MB for that task of 4MB per hour.

I don't see how one gets to 17GB from 12MB.
ID: 23115 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23116 - Posted: 18 Sep 2011, 21:21:34 UTC
Last modified: 18 Sep 2011, 21:23:35 UTC

Just a thought, do you guys have your "write to disk" interval at some small number, maybe it is checkpointing way too often and creating a large file.

I use 900 seconds which is 15 minutes.
ID: 23116 · Report as offensive     Reply Quote
Profile biancaw

Send message
Joined: 21 Feb 08
Posts: 3
Credit: 292,398
RAC: 0
Message 23117 - Posted: 18 Sep 2011, 21:26:42 UTC - in response to Message 23116.  
Last modified: 18 Sep 2011, 21:33:07 UTC

Just a thought, do you guys have your "write to disk" interval at some small number, maybe it is checkpointing way too often and creating a large file.

I use 900 seconds which is 15 minutes.



BoincTasks says, i have 1 checkpoint / minute.

It works for 91 projects, if its not working for LHC its a project problem.
ID: 23117 · Report as offensive     Reply Quote
Profile Byron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 1 Sep 04
Posts: 44
Credit: 765,348
RAC: 31
Message 23118 - Posted: 18 Sep 2011, 21:27:25 UTC - in response to Message 23116.  
Last modified: 18 Sep 2011, 21:31:01 UTC

Just a thought, do you guys have your "write to disk" interval at some small number, maybe it is checkpointing way too often and creating a large file.

I use 900 seconds which is 15 minutes.

ah ... yes good thinking and catch ... I have:

Tasks checkpoint to disk at most every 60 seconds


I'll change mine to 900 seconds ... and see if that helps :)
ID: 23118 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23119 - Posted: 18 Sep 2011, 21:34:36 UTC

I bet you guys with 60 seconds spend more time writing checkpoints files than computing. This problem crops up at malariacontrol which has huge checkpoint files.

i'd rather loose upto 15 minutes of work, as I don't reboot all that often than waste half the day wearing out my disk drive.

My guess is from my 12MB for 4 hours it is generating 1MB per checkpoint, and if you write 60MB per hour that is 300MB for 5 hours per task. This might be your problem. Especially if your client pauese a lot of tasks to start new ones. Also this multiplies by every project you run.

Try either 300 for 5 minutes or 900 for 15 minutes. I think in the long run you'll find overall better performance with not that much loss, unless your client regularly crashes or you reboot multiple times a day.

Report back if that helps, then we will know.
ID: 23119 · Report as offensive     Reply Quote
Profile biancaw

Send message
Joined: 21 Feb 08
Posts: 3
Credit: 292,398
RAC: 0
Message 23120 - Posted: 18 Sep 2011, 21:41:20 UTC - in response to Message 23119.  
Last modified: 18 Sep 2011, 21:53:21 UTC

I bet you guys with 60 seconds spend more time writing checkpoints files than computing. This problem crops up at malariacontrol which has huge checkpoint files.

i'd rather loose upto 15 minutes of work, as I don't reboot all that often than waste half the day wearing out my disk drive.

My guess is from my 12MB for 4 hours it is generating 1MB per checkpoint, and if you write 60MB per hour that is 300MB for 5 hours per task. This might be your problem. Especially if your client pauese a lot of tasks to start new ones. Also this multiplies by every project you run.

Try either 300 for 5 minutes or 900 for 15 minutes. I think in the long run you'll find overall better performance with not that much loss, unless your client regularly crashes or you reboot multiple times a day.

Report back if that helps, then we will know.


I have no problem with 300MB there is enough space for 12 GB / task.

have looked at the crashed wus and what my wingman do
they are in progress, unsent or also crashed non of them ar valid.
ID: 23120 · Report as offensive     Reply Quote
Profile Byron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 1 Sep 04
Posts: 44
Credit: 765,348
RAC: 31
Message 23121 - Posted: 18 Sep 2011, 21:55:14 UTC - in response to Message 23119.  

Report back if that helps, then we will know.

Yes thank you Keith ... will do
I now have 8 of the SixTrack 503.08 crunching with no problems
on my DELL Workstation ... 8 CPU's with no HT
ID: 23121 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 24 Nov 06
Posts: 76
Credit: 6,720,840
RAC: 0
Message 23123 - Posted: 19 Sep 2011, 0:34:16 UTC

This is happening to me all over the place. Hundreds of hours of wasted crunching time. This needs to be addressed by the project immediately.
Dublin, California
Team: SETI.USA

ID: 23123 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23124 - Posted: 19 Sep 2011, 0:58:38 UTC
Last modified: 19 Sep 2011, 1:42:38 UTC

I found the problem I think in the init_data.ini file

<rsc_disk_bound>30000000.000000</rsc_disk_bound>, This is 28MB

Those last two running ended in computational error after 6 hours, and by my last check the LHC folder was up to 60MB - 4MB showing now that it is empty of tasks is 54MB divided by 2 tasks is 26MB each, they ended shortly after I last checked. They had about 4 more hours to go.

These are the first two that errored for me.

This is a limit the project can put on a task, that limit might have be fine for the short runs, but at the 1 million turns it is not enough.

I've got one now at 99% and 27.1 MB (28,512,256 bytes) - It finished at 5:29:40

Next 95% done is at 25.7 MB (26,996,736 bytes) 05:34:44 and 10 mins to go

Next 58% done is at 27.7 MB (29,048,832 bytes) 05:35:30 with 3 hours to go = i bet this would error, except i stopped boinc and manually bumbped up the limit and it got to 05:51:07 at 28.7 MB (30,146,560 bytes) and counting.

So this is the problem - The project needs to up the limit.
ID: 23124 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : error -177 resource limit exceeded


©2020 CERN