Message boards : Number crunching : Long WU's
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Dirk Broer

Send message
Joined: 20 Sep 05
Posts: 31
Credit: 1,212,091
RAC: 10
Message 24585 - Posted: 13 Aug 2012, 10:45:10 UTC - in response to Message 24577.  
Last modified: 13 Aug 2012, 10:54:42 UTC

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=2504999

Update: 103 hours gone, still 79 hours to go and way past the expected return date of 2012-08-11 (Prescott P4@3200, running high priority all the time. OS: 32-bit WinXP, 2 Gb of DDR2 RAM -the Intel D102GGC2 mobo won't support more).
ID: 24585 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 24586 - Posted: 13 Aug 2012, 10:49:36 UTC

70+ hours gone on my AMD E-450 APU, 45.157% done 80 hours to go, deadline past yesterday morning.
Tullio
ID: 24586 · Report as offensive     Reply Quote
Dirk Broer

Send message
Joined: 20 Sep 05
Posts: 31
Credit: 1,212,091
RAC: 10
Message 24587 - Posted: 13 Aug 2012, 10:57:35 UTC - in response to Message 24586.  

I win! Mine is supposed to run for a grand total of 180 hours, yours only 150....Anyway, as long as the result gets declared valid I crunch on ;)
ID: 24587 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 24588 - Posted: 13 Aug 2012, 13:53:27 UTC - in response to Message 24565.  

Well I THINK they are synonymous..........
We shall see. Eric.
ID: 24588 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 24590 - Posted: 13 Aug 2012, 15:50:46 UTC - in response to Message 24588.  

Well I THINK they are synonymous..........
We shall see. Eric.

It's SSE3 which should be synonymous with PNI. SSE2 might possibly be slower, but probably not by much.
ID: 24590 · Report as offensive     Reply Quote
Dirk Broer

Send message
Joined: 20 Sep 05
Posts: 31
Credit: 1,212,091
RAC: 10
Message 24591 - Posted: 13 Aug 2012, 16:11:59 UTC - in response to Message 24588.  
Last modified: 13 Aug 2012, 16:14:35 UTC

Well I THINK they are synonymous..........We shall see. Eric.


Wikipedia says they're synonymous, SSE3 and Prescott New Instructions (PNI)
ID: 24591 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 7,704,455
RAC: 259
Message 24601 - Posted: 15 Aug 2012, 7:18:23 UTC

So what happens when a task does run beyond its deadline? Apologies if this has been asked before.

I have a task that is only 45% completed and close to its deadline. Should I abort it now and stop wasting time or should I let it run?

I don't care about credits very much, although more credits are always better.
ID: 24601 · Report as offensive     Reply Quote
T.J.

Send message
Joined: 17 Feb 07
Posts: 86
Credit: 968,855
RAC: 0
Message 24603 - Posted: 15 Aug 2012, 9:14:09 UTC - in response to Message 24601.  

So what happens when a task does run beyond its deadline? Apologies if this has been asked before.

I have a task that is only 45% completed and close to its deadline. Should I abort it now and stop wasting time or should I let it run?

I don't care about credits very much, although more credits are always better.


Well at other projects a taks after the deadline is marked as invalid and get no credits.
I have aborted a long wu which never would met the deadline. I have now a new one, still need 58 hours in 6 days, so that must be no problem.
Greetings from,
TJ
ID: 24603 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 24604 - Posted: 15 Aug 2012, 9:50:39 UTC - in response to Message 24603.  

So what happens when a task does run beyond its deadline? Apologies if this has been asked before.

I have a task that is only 45% completed and close to its deadline. Should I abort it now and stop wasting time or should I let it run?

I don't care about credits very much, although more credits are always better.

Well at other projects a taks after the deadline is marked as invalid and get no credits.
I have aborted a long wu which never would met the deadline. I have now a new one, still need 58 hours in 6 days, so that must be no problem.

Not quite true.

After the deadline has passed, the WU remains 'live' while a replacement task is created, sent to another user, computed, and returned. Provided your task is returned before the replacement comes back, you are eligible for credit.

Look at WU 2505053, especially the middle task.

The deadline for that middle task was 11 Aug 2012 | 0:40:32 UTC, and the user missed it. The third task was sent to me - almost immediately, in that case, because no new work was being generated at the time: usually they hang around for several hours.

But mine took a long time to run, so that the middle user returned his before me, while the WU was still incomplete. Finally, I returned my copy before my assigned deadline of 14 Aug 2012 | 18:40:32 UTC, and all three of us got credit. That's correct, according to the way BOINC is designed to work.
ID: 24604 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 24605 - Posted: 15 Aug 2012, 18:30:44 UTC - in response to Message 24601.  

Frankly I don't know.......
We shall be looking at all this tomorrow.
If you are close, let it run, and we shall see.

Eric.
(I already have 98% of these very long taks finished successfully.)
ID: 24605 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 24606 - Posted: 15 Aug 2012, 18:30:50 UTC - in response to Message 24601.  

Frankly I don't know.......
We shall be looking at all this tomorrow.
If you are close, let it run, and we shall see.

Eric.
(I already have 98% of these very long taks finished successfully.)
ID: 24606 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 7,704,455
RAC: 259
Message 24613 - Posted: 16 Aug 2012, 8:20:00 UTC - in response to Message 24606.  

Thanks. I will let it run a bit more and see what happens. It's this task:
http://lhcathomeclassic.cern.ch/sixtrack/result.php?resultid=5623859

It has the misfortune of running on my slowest machine, an Atom-powered netbook. I see 2 other people completed the WU for that task already, so I don't know if I will get any credit at all. Bummer if I don't, but fortunately that is not my motivation.

Looking just now the task has run for almost 142 hours, reports that is has another 114 hours to go, and has only 18 hours until the deadline.

Previously I aborted several tasks that had extremely long running times, but then I noticed that after a very slow start they tended to accellerate and complete in a decent time anyway. Which is why I let this one run on too, in the hope it would have the same pattern. Unfortunately, it is accellerating, but only very, very slowly.

Let's see what happens. If I'm better off aborting the task and starting on something else let me know. I don't like wasting CPU time, even if it's a slow CPU.
ID: 24613 · Report as offensive     Reply Quote
Profile CoM

Send message
Joined: 29 Sep 04
Posts: 42
Credit: 11,505,632
RAC: 0
Message 24619 - Posted: 16 Aug 2012, 17:00:14 UTC

I also had such a monster:
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=2515279

Spending 110 hours (more ) on it and only getting 190.57 Credits.
Thats not fair ;(
ID: 24619 · Report as offensive     Reply Quote
Profile Gary Roberts

Send message
Joined: 22 Jul 05
Posts: 72
Credit: 3,962,626
RAC: 0
Message 24623 - Posted: 17 Aug 2012, 0:21:22 UTC - in response to Message 24613.  

.... If I'm better off aborting the task and starting on something else let me know. I don't like wasting CPU time, even if it's a slow CPU.

Because there are already two validated tasks for that workunit, the quorum is already complete and you should immediately abort your now unnecessary copy.

If it could be completed before deadline, it could also receive credit. Once the deadline passes, you will not receive credit so (from what you say) you should abort it immediately and stop wasting time.

Cheers,
Gary.
ID: 24623 · Report as offensive     Reply Quote
Profile Gary Roberts

Send message
Joined: 22 Jul 05
Posts: 72
Credit: 3,962,626
RAC: 0
Message 24626 - Posted: 17 Aug 2012, 1:42:18 UTC - in response to Message 24619.  

I also had such a monster:
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=2515279

Spending 110 hours (more ) on it and only getting 190.57 Credits.
Thats not fair ;(

Everybody who had one of those 10M turn tasks suffered the same fate and Eric has already accepted blame for the oversight - both for the limit on credit and for the inadequate deadline. Not much use complaining further.

But what about this particular example. It's a completed and validated quorum where one host took 180Ksecs and the other took 303Ksecs and the credit award was 0.00 for both.

It's not an isolated event. Here is a small list of completed and validated quorums where zero credit was given. There are lots more like these.

2566284
2566253
2566221
2566220
2566215
2566214
2561956
2560652
2560651
2560648

The common factor is that one of the hosts participating in all those quorums is running the sse3 version of the application under the anonymous platform mechanism (AP). When the various versions were first released, there were problems with the detection of CPU capabilities and all my hosts (even though sse3 capable) were being sent the much slower generic app. I solved that problem by forcing the use of the sse3 version with AP.

At that time I didn't see any problem with credit awards. It's possible I wasn't paying close enough attention but I do believe all validated tasks were receiving normal credit. When the CPU detection was improved, I started removing AP from my hosts as caches drained. I wasn't in any particular hurry - I was making the transition when convenient. I still had quite a few machines to go when I started noticing the zero credit awards. Not every result gets zero credit. At least half or slightly more get normal credit. It seems to be a pretty random thing.

I reported it to Igor and Eric over a week ago but the behaviour continues. The caches for the last couple of AP hosts should drain today so when AP is removed on those hosts, that will be the end of the problem for me at present. The problem should be investigated so that future use of AP is not compromised.

Cheers,
Gary.
ID: 24626 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 7,704,455
RAC: 259
Message 24627 - Posted: 17 Aug 2012, 1:55:50 UTC - in response to Message 24623.  

.... If I'm better off aborting the task and starting on something else let me know. I don't like wasting CPU time, even if it's a slow CPU.

Because there are already two validated tasks for that workunit, the quorum is already complete and you should immediately abort your now unnecessary copy.

If it could be completed before deadline, it could also receive credit. Once the deadline passes, you will not receive credit so (from what you say) you should abort it immediately and stop wasting time.


You're right. I'm just seeing it listed as an Error result because it timed out. I'm not at home now, so it may still be crunching away. I'll abort it when I get home. Sad for the wasted week of computing time. Older, yes; wiser, maybe.
ID: 24627 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 24628 - Posted: 17 Aug 2012, 8:09:36 UTC - in response to Message 24626.  

Everybody who had one of those 10M turn tasks suffered the same fate and Eric has already accepted blame for the oversight - both for the limit on credit and for the inadequate deadline. Not much use complaining further.

But what about this particular example. It's a completed and validated quorum where one host took 180Ksecs and the other took 303Ksecs and the credit award was 0.00 for both.

It's not an isolated event. Here is a small list of completed and validated quorums where zero credit was given. There are lots more like these.

I didn't check the whole list of zero-credit WUs, but every one I checked shared another common factor: the Anonymous Platform host has declared a run time of 0.00 seconds.

I don't think that's related to AP as such: it's likely to be the result of using BOINC version 6.2.15 - I think the separation of runtime and CPU time in client reports came in round about BOINC v6.6.xx, with the introduction of GPU computing (where the two figures differ greatly).

If the client is not making a runtime report, the server is defaulting the missing field to zero, and Igor has implemented 'credit from runtime', we would get the result you report, without invoking Anonymous Platform at all.

This problem rings a vague bell in my mind, and I thought it had been addressed - perhaps by using a copy of the CPU time report as a surrogate runtime report, at the server end - but it will take some lengthy rummaging in the archives of the boinc_alpha mailing list to confirm that.

If it helps, I could fairly easily switch one of my hosts to AP, using a later BOINC, so that we could check whether there is any specific problem with AP alone, separately from the BOINC version issue.
ID: 24628 · Report as offensive     Reply Quote
Uffe F

Send message
Joined: 9 Jan 08
Posts: 66
Credit: 727,923
RAC: 0
Message 24629 - Posted: 17 Aug 2012, 10:17:57 UTC - in response to Message 24628.  
Last modified: 17 Aug 2012, 10:27:30 UTC

Everybody who had one of those 10M turn tasks suffered the same fate and Eric has already accepted blame for the oversight - both for the limit on credit and for the inadequate deadline. Not much use complaining further.

But what about this particular example. It's a completed and validated quorum where one host took 180Ksecs and the other took 303Ksecs and the credit award was 0.00 for both.

It's not an isolated event. Here is a small list of completed and validated quorums where zero credit was given. There are lots more like these.

I didn't check the whole list of zero-credit WUs, but every one I checked shared another common factor: the Anonymous Platform host has declared a run time of 0.00 seconds.

I don't think that's related to AP as such: it's likely to be the result of using BOINC version 6.2.15 - I think the separation of runtime and CPU time in client reports came in round about BOINC v6.6.xx, with the introduction of GPU computing (where the two figures differ greatly).

If the client is not making a runtime report, the server is defaulting the missing field to zero, and Igor has implemented 'credit from runtime', we would get the result you report, without invoking Anonymous Platform at all.

This problem rings a vague bell in my mind, and I thought it had been addressed - perhaps by using a copy of the CPU time report as a surrogate runtime report, at the server end - but it will take some lengthy rummaging in the archives of the boinc_alpha mailing list to confirm that.

If it helps, I could fairly easily switch one of my hosts to AP, using a later BOINC, so that we could check whether there is any specific problem with AP alone, separately from the BOINC version issue.


I also just checked and found 2 of these in my results:
WU 2554378
WU 2554391

Both had the same wingman. He was using: Linux 2.6.26.8.tex3, BOINC version 6.2.15, CPU type: GenuineIntel, Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz [Family 6 Model 23 Stepping 10]

So the Boinc version and anonymous platform might be a bad combo for this project.

Maybe the AP makes it not able to use the CPU report instead of the runtime report.

Edit: I was just looking through his results. He has a lot of WU's that give 0 credit. But I can see now that he changed back to normal platform again, and since he did that, he got full points for all of those. But he didn't update the boinc version.

So it is definately something with AP, and probably also something with v6.2.x, but probably only when in combination with AP.
ID: 24629 · Report as offensive     Reply Quote
Zapped Sparky

Send message
Joined: 22 Oct 08
Posts: 26
Credit: 75,214
RAC: 0
Message 24635 - Posted: 18 Aug 2012, 2:45:41 UTC

Richard, I've been running AP since the various executable's came out. (sse, sse2 etc...)

For two reasons:

1:To stop boinc downloading a different executable after each time I run out of tasks. To prevent it using something lower.

2:To stop boinc downloading the same executable again and again once I run out of tasks and get some more after a while.

So far I'm running Boinc 7.0.28 on AP and no tasks have reported zero run time.
ID: 24635 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 24655 - Posted: 19 Aug 2012, 20:11:19 UTC

Yes, I take the blame but this is because I exceeded some
limits in BOINC itself. Igor will explain all this shortly when
he has finished investigating and correcting.

Sometimes it is essential to get a new executable. While I hope
we shall now remain numerically compatible (for ever? :-) there
will be new physics in SixTrack, new elements in the ring.


ID: 24655 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Long WU's


©2024 CERN