Message boards : Number crunching : WUs starting from beginning at each start of BOINC
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
den

Send message
Joined: 17 Sep 04
Posts: 5
Credit: 5,242
RAC: 0
Message 6275 - Posted: 2 Mar 2005, 18:35:08 UTC

Hello
It's been 2 days I am stuck to 2 WUs that start from beginning each time BOINC starts (even without rebooting windows)

hardware : Pentium IV 3.00GHz w/ HT, XP SP2, BOINC 4.19, sixtrack 4.64

I'm sorry if a similar problem with boinc or sixtrack is known, but unfortunately there's no search fonction on this message board

the latest stderr is empty, the latest stdout is normal (see below)

resetting project doesn't change a thing

thanks for your help

yours,

Denis


2005-03-02 19:24:02 [---] Starting BOINC client version 4.19 for windows_intelx86
2005-03-02 19:24:02 [LHC@home] Project prefs: no separate prefs for home; using your defaults
2005-03-02 19:24:02 [LHC@home] Host ID is 2467
2005-03-02 19:24:02 [---] General prefs: from unknown project http://setiathome.berkeley.edu/ (last modified 2004-09-04 21:12:26)
2005-03-02 19:24:02 [---] General prefs: no separate prefs for home; using your defaults
2005-03-02 19:24:03 [LHC@home] Resuming computation for result v64lhc91-43s8_10530_1_sixvf_5296_0 using sixtrack version 4.64
2005-03-02 19:24:04 [LHC@home] Resuming computation for result v64lhc91-42s8_10545_1_sixvf_5283_1 using sixtrack version 4.64

ID: 6275 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 6277 - Posted: 2 Mar 2005, 19:11:12 UTC

It's not restarting, it's resuming the unit. Or at least, that's what I read from your log you gave.

Are you running another project besides LHC?
If you are, you'll notice that they switch every 60 minutes (standard time in the general preferences), where one project pauses while the other runs.
When a project unpauses, it resumes where it left off. If you check in the Work tab of the BOINC application, you'll see that the unit is still being crunched past 0%.

If you are only crunching LHC, it's still the same thing. When you exit BOINC, the state of LHC is saved to the harddrive. The next time you start LHC, you continue about where you left off the last time.
Jord

BOINC FAQ Service
ID: 6277 · Report as offensive     Reply Quote
den

Send message
Joined: 17 Sep 04
Posts: 5
Credit: 5,242
RAC: 0
Message 6278 - Posted: 2 Mar 2005, 19:18:23 UTC - in response to Message 6277.  

if you want it's resuming, but resuming from 0% whereas before I exited BOINC WUs were at something else than 0%

I'm only running LHC, and what I was saying is that I don't continue about wher I left off the laste time, I come back to 0%

well I thought my message was clear enough, sorry about that

hope you get the problem
ID: 6278 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 6279 - Posted: 2 Mar 2005, 19:35:07 UTC

Okay, if you go to your general preferences, what time is set for Write to disk at most every?

Have you tried changing it, to 30 seconds?
Jord

BOINC FAQ Service
ID: 6279 · Report as offensive     Reply Quote
Profile Thierry Van Driessche
Avatar

Send message
Joined: 1 Sep 04
Posts: 157
Credit: 82,604
RAC: 0
Message 6281 - Posted: 2 Mar 2005, 19:59:38 UTC

@Ageless,

I saw exactly the same some hours ago:
I shut down Boinc, start over again.

2 WU's were crunching before shutting Boinc. When starting again Boinc, the 2 WU's started crunching again at 00:00:00 CPU time although some 35 to 40 minutes was already done.

I let Boinc do, shut down once again with the WU's having done some 15 to 20 minutes. Starting Boinc again, the 2 WU's started once again at 00:00:00 CPU time !!??

BTW, I didn' took any alcohol neither my PC ;~)

Best greetings from Belgium
Thierry
ID: 6281 · Report as offensive     Reply Quote
den

Send message
Joined: 17 Sep 04
Posts: 5
Credit: 5,242
RAC: 0
Message 6282 - Posted: 2 Mar 2005, 20:00:25 UTC - in response to Message 6279.  

it was 60 secondes (default), changing it to 30 seconds doesn't make a difference (10 seconds either)

2 additional facts :
- when I start boinc, I briefly see the previous %ages (before exiting) and then it automatically comes back to 0%
- when I run CPU bchmrk, the same thing occurs, computation resumes at 0% for the 2 WUs

I admit I'm really surprised by this uncommon and sudden behavior!

thanks Ageless for your help anyway
ID: 6282 · Report as offensive     Reply Quote
Profile Thierry Van Driessche
Avatar

Send message
Joined: 1 Sep 04
Posts: 157
Credit: 82,604
RAC: 0
Message 6284 - Posted: 2 Mar 2005, 20:03:56 UTC - in response to Message 6282.  

> it was 60 secondes (default), changing it to 30 seconds doesn't make a
> difference (10 seconds either)

I tried playing with that setting also and had the same experience.
ID: 6284 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 6286 - Posted: 2 Mar 2005, 20:47:43 UTC

I just exited BOINC, as my PP@H unit is running. It was at 1.36%, it started over at 1%. I have write to disk every 30 seconds.

It might be that PP@H doesn't write to disk correctly. But it may also be your preferences, since mine didn't restart from 0%. Make sure your disk preferences (Use no more than/Leave at least) aren't any bigger than the actual space you have.

By actual space, I really mean that. If your total harddrive/partition is 40GB, but you have only got 4GB left, you put as maximum 40GB, minimum 100MB (0.1GB), while you calculate how much space can be used at maximum for the percentage.

In the above example 50% is too much.

I put all my (now 6) BOINC projects on their own 5.5GB partition. It works. ;)
Jord

BOINC FAQ Service
ID: 6286 · Report as offensive     Reply Quote
Profile Thierry Van Driessche
Avatar

Send message
Joined: 1 Sep 04
Posts: 157
Credit: 82,604
RAC: 0
Message 6288 - Posted: 2 Mar 2005, 21:00:41 UTC - in response to Message 6286.  

> I just exited BOINC, as my PP@H unit is running. It was at 1.36%, it started
> over at 1%. I have write to disk every 30 seconds.
>
> It might be that PP@H doesn't write to disk correctly. But it may also be your
> preferences, since mine didn't restart from 0%. Make sure your disk
> preferences (Use no more than/Leave at least) aren't any bigger than the
> actual space you have.
>
> By actual space, I really mean that. If your total harddrive/partition is
> 40GB, but you have only got 4GB left, you put as maximum 40GB, minimum 100MB
> (0.1GB), while you calculate how much space can be used at maximum for the
> percentage.
>
> In the above example 50% is too much.
>
> I put all my (now 6) BOINC projects on their own 5.5GB partition. It works. ;)

Looking into the tab "Disk" I have still some 1,12GB free for Boinc, so I don't believe that this could be the cause of the problem as I saw it.
ID: 6288 · Report as offensive     Reply Quote
den

Send message
Joined: 17 Sep 04
Posts: 5
Credit: 5,242
RAC: 0
Message 6289 - Posted: 2 Mar 2005, 21:09:26 UTC - in response to Message 6288.  

> Looking into the tab "Disk" I have still some 1,12GB free for Boinc, so I
> don't believe that this could be the cause of the problem as I saw it.

Same conclusion for me, and there has been no significative change on free space on my hard drive for the last week/month, and this phenomenon begun 2 days ago (or maybe yesterday).

That is weird, I'm wondering if anyone else than Thierry and me is experiencing the same thing.
ID: 6289 · Report as offensive     Reply Quote
Profile Thierry Van Driessche
Avatar

Send message
Joined: 1 Sep 04
Posts: 157
Credit: 82,604
RAC: 0
Message 6290 - Posted: 2 Mar 2005, 21:14:30 UTC - in response to Message 6289.  

> Same conclusion for me, and there has been no significative change on free
> space on my hard drive for the last week/month, and this phenomenon begun 2
> days ago (or maybe yesterday).
>
> That is weird, I'm wondering if anyone else than Thierry and me is
> experiencing the same thing.

Denis,

I believe the only way to avoid the problem is to leave Boinc running 24/7. That is way I'm doing normally. By doing so, I didn't saw any problem concerning this issue. (00:00:00 CPU time being another problem of course)
ID: 6290 · Report as offensive     Reply Quote
Profile Contact
Avatar

Send message
Joined: 17 Sep 04
Posts: 54
Credit: 1,957,272
RAC: 3,952
Message 6294 - Posted: 3 Mar 2005, 0:47:45 UTC - in response to Message 6289.  
Last modified: 3 Mar 2005, 0:50:48 UTC

>I'm wondering if anyone else than Thierry and me is experiencing the same thing.
>
Wonder no longer.
Similar problem here. See this thread .


ID: 6294 · Report as offensive     Reply Quote
Profile Razorirr

Send message
Joined: 18 Sep 04
Posts: 27
Credit: 2,559
RAC: 0
Message 6296 - Posted: 3 Mar 2005, 3:12:58 UTC

still logged in??? ok well...
Ageless dont blow up im having the same issue too. my hdd is empty pretty much granted its only 18 gig. ive done the stuff you mentioned and it didnt work. it has worked on other projects but not now. the project admin are able to screw up.


ID: 6296 · Report as offensive     Reply Quote
Aurora Borealis

Send message
Joined: 18 Sep 04
Posts: 59
Credit: 317,857
RAC: 0
Message 6298 - Posted: 3 Mar 2005, 5:24:46 UTC

I've been having a similar problem with certain WU. It appears that the problem is with WU labeled v64lhc. They don't appear to have any saved checkpoint, so they restart at 0% after being paused. They seem to need to be processed in one continuous run to be successful.
WU labeled v64boince restarted properly after being paused.

If you look at an earlier thread called 'Odd occurrence' you will see this problem discussed.


Questions? Answers are in the BOINC Wiki.

Boinc V6.10.56 Recommended
WinXP C2D 2.1G 3GB
ID: 6298 · Report as offensive     Reply Quote
den

Send message
Joined: 17 Sep 04
Posts: 5
Credit: 5,242
RAC: 0
Message 6300 - Posted: 3 Mar 2005, 6:51:26 UTC - in response to Message 6298.  

> I've been having a similar problem with certain WU. It appears that the
> problem is with WU labeled v64lhc. They don't appear to have any saved
> checkpoint, so they restart at 0% after being paused. They seem to need to be
> processed in one continuous run to be successful.
> WU labeled v64boince restarted properly after being paused.
>
> If you look at an earlier thread called 'Odd occurrence' you will see this
> problem discussed.

That makes sense, I just got a "v64boince"-labeled WU and resuming works well.
Thanks for the info, my cartesian mind is relieved, then.

ID: 6300 · Report as offensive     Reply Quote
Vax

Send message
Joined: 29 Sep 04
Posts: 2
Credit: 833,537
RAC: 0
Message 6316 - Posted: 3 Mar 2005, 16:52:04 UTC

Having the same problem here.

I have 4 projects running in BOINC. LHC was idle while something else was running. I noted LHC was sitting at somewhere above 96% completion when it got pulled from memory to swap to another project. When it kicked back in, it restarted back at 0% completion and is chugging away (finally back to 31% completion).

BOINC is running almost 24/7, so it's not an issue with turning it off or on.

I do not know if this particular workunit has been doing this continually, I only noticed it today. I do suspect that other LHC workunits (on my work and home machine) have done the same thing in the past.

This is the workunit particulars:
LHC@home - 2005-03-03 11:22:45 - Restarting result v64lhc87-24s12_14575_1_sixvf_3481_2 using sixtrack version 4.64

On a side note, I do not know how often I have had workunits swap out at 96-99% completion. Maybe something for BOINC to consider is to look at the estimated remaining time for a workunit and, if it's less than a certain amount of time, finish the darn thing before swapping out to do another project. It seems silly for a workunit to sit for hours, waiting to start again, while other projects swap in and out and it only had 10-15 minutes left to finish. Just a thought (pet peeve hat off).

Tom
ID: 6316 · Report as offensive     Reply Quote
Vid Vidmar*
Avatar

Send message
Joined: 28 Sep 04
Posts: 27
Credit: 17,091
RAC: 0
Message 6337 - Posted: 4 Mar 2005, 11:53:36 UTC - in response to Message 6289.  

> > Looking into the tab "Disk" I have still some 1,12GB free for Boinc, so
> I
> > don't believe that this could be the cause of the problem as I saw it.
>
> Same conclusion for me, and there has been no significative change on free
> space on my hard drive for the last week/month, and this phenomenon begun 2
> days ago (or maybe yesterday).
>
> That is weird, I'm wondering if anyone else than Thierry and me is
> experiencing the same thing.

I am. And I pointed it out on another thread to no avail. I beleive that this behaviour has something to to with all those 0 time results.

ID: 6337 · Report as offensive     Reply Quote
Cabezon [Canarias]
Avatar

Send message
Joined: 17 Sep 04
Posts: 3
Credit: 8,805
RAC: 0
Message 6353 - Posted: 4 Mar 2005, 16:28:01 UTC

I has an Celeron 500 Laptop, and of course I don't want to get it running on 24/7. Several times, the LHC unit starts again from 0%. I had reseted and refreshes all the files, but the problem goes on.
Now I just start on with that PC, and the unit is on 8.24% with 2:45:33 done and 30:36:49 to end. The unit is v64boince61b-32s4_6645_1_sixvf_11798_4.
Just now, it seems not to be any problem. I'll off my PC and I'll report again.
Thanks
<img src="http://150.214.190.154/BOINCStatistics/Signature/Signature.php?userName=Cabezon" />
ID: 6353 · Report as offensive     Reply Quote
Vax

Send message
Joined: 29 Sep 04
Posts: 2
Credit: 833,537
RAC: 0
Message 6357 - Posted: 4 Mar 2005, 18:18:20 UTC - in response to Message 6316.  

Well, the problem still exists. The same workunit (on the machine at my office) has now restarted back at 0% completion 4 more times (since I last wrote) each time LHC starts back up on BOINC.

I confirmed last night that the same problem is happening on LHC workunits on my home computer.

If it was just happening on my work machine, I'd think it's a workunit from hell, but since it's happening on both machines, it must be something wrong with the sixtrack software, or how it works with BOINC.

Real drag though, in an hour, the workunit gets to about 90-94% completion when LHC swaps out, if the computer (or sixtrack) was just a snick faster, I'd manage to get that sucker done. Instead I find out 4 hours later that it restarted back at 0 again.

Unless someone has a different solution, I'm going to have to detach LHC from BOINC. I hate to spend the effort on something that is just wasting resources that could be used to analyse the other 3 programs I'm running on BOINC.

Tom

> Having the same problem here.
>
> I have 4 projects running in BOINC. LHC was idle while something else was
> running. I noted LHC was sitting at somewhere above 96% completion when it
> got pulled from memory to swap to another project. When it kicked back in, it
> restarted back at 0% completion and is chugging away (finally back to 31%
> completion).
>
> This is the workunit particulars:
> LHC@home - 2005-03-03 11:22:45 - Restarting result
> v64lhc87-24s12_14575_1_sixvf_3481_2 using sixtrack version 4.64
> Tom
>
ID: 6357 · Report as offensive     Reply Quote
Profile Thierry Van Driessche
Avatar

Send message
Joined: 1 Sep 04
Posts: 157
Credit: 82,604
RAC: 0
Message 6359 - Posted: 4 Mar 2005, 18:34:53 UTC - in response to Message 6357.  
Last modified: 4 Mar 2005, 18:35:24 UTC

> Unless someone has a different solution, I'm going to have to detach LHC from
> BOINC. I hate to spend the effort on something that is just wasting resources
> that could be used to analyse the other 3 programs I'm running on BOINC.
>
> Tom

Do you use "Leave applications in memory while preempted". If not, try to use it (don't forget to do an "Update").

My assumption is when using it, the WU will start over again at the good %, the only problem that could remain is the 00:00:00 CPU time. But that's another story ;o)


Best greetings from Belgium
Thierry
ID: 6359 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : WUs starting from beginning at each start of BOINC


©2024 CERN