Message boards : Number crunching : Long delays in jobs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Profile geo...
Avatar

Send message
Joined: 18 Sep 04
Posts: 7
Credit: 2,320,210
RAC: 0
Message 23557 - Posted: 19 Oct 2011, 15:37:42 UTC - in response to Message 23554.  

Gee (In) Valdis,

As if posting less means crunching more!!!

Some folks spend a buncha time helping to improve the project for everybody
and they get badmouthed for using the Forum for it's intended purpose by some newbie with Seasonal Affective Disodor--
er, Disorder...
ID: 23557 · Report as offensive     Reply Quote
Zapped Sparky

Send message
Joined: 22 Oct 08
Posts: 26
Credit: 75,214
RAC: 0
Message 23558 - Posted: 19 Oct 2011, 18:12:46 UTC - in response to Message 23552.  
Last modified: 19 Oct 2011, 18:29:16 UTC

Well, no, that quorum I linked to has nothing to do with the fast reliable hosts experiment. I highlighted it to draw attention to what appears to be a BOINC server bug. That quorum has been completed but one of its members has been left in an inconsistent state. The task deemed to be invalid has been left as 'pending' and I imagine this might prevent the quorum from being deleted at the appropriate time. In the past there were plenty of examples of pending tasks left cluttering up the database long after the main body of tasks had been deleted. I don't want to see that happening again.


@Gary Roberts, Apologies, I missed that task still pending. I just saw the two completed and validated and thought the workunit was done. Thanks for the explanation, before the server move I had a task pending for about a year :) I now know what you're on about RE:the server bug and not wanting the pendings sticking around.

The criteria for a fast reliable host might need a bit of tweaking if this and this occur with any frequency. In these two examples, a short deadline resend was issued when a primary task failed. In both those cases, the first resend timed out and that triggered a second short deadline resend. It's quite possible that the first resend might get returned late and so complete the quorum. If that happens, I trust the second resends will still be awarded credit if they get completed before their own shortened deadlines. I was interested to note that the computer used for the first resend in the first of the examples given had a turnaround time of close to 4 days - hardly what you would call 'fast and reliable' :-).


Looks like the re-sends are complete and all who completed got validated and credit except for one, and there's nothing in the stderr to indicate a problem.

[EDIT] @VALDIS, credits for me are just an indicator of my computers progress. If it dropped I'd know something was up (eg errored tasks)and would look further into it.[/EDIT]
ID: 23558 · Report as offensive     Reply Quote
Profile Tom95134

Send message
Joined: 4 May 07
Posts: 250
Credit: 826,541
RAC: 0
Message 23559 - Posted: 19 Oct 2011, 19:41:10 UTC - in response to Message 23556.  

I don't care about credits. I've been running CERN jobs in BOINC_VM for days without getting a single credit because of a bug in the Test4Theory wrapper in version 6.05. Now it is in version 7.01 and I am slowly going up the credits ladder. I have been running it since November 28 and my user number is 10.
Tullio

Here! Here!
ID: 23559 · Report as offensive     Reply Quote
Profile Tom95134

Send message
Joined: 4 May 07
Posts: 250
Credit: 826,541
RAC: 0
Message 23560 - Posted: 19 Oct 2011, 19:43:34 UTC - in response to Message 23554.  

If you guys crunched as much as you chatted, you all would be up in the 100's of million credits. I myself have only one machine that runs 24/7 at 100% across the board and I still manage to do work on 3D Studio Max and Maya. So why don't you all just crank up your work-load percentage and REALLY contribute to the scientific community. By Dec. 15th I will be finished building my new machine, which should be capable of atleast a million credits/day. Come on guys, I know your machines can do better.....JUST LET THEM RIP, yes you can still play your silly games, without missing a beat.

WHat is 3D Studio Max and Maya? I don't see then listed in the Add Project list.
ID: 23560 · Report as offensive     Reply Quote
VALDIS

Send message
Joined: 22 May 11
Posts: 2
Credit: 132,444
RAC: 0
Message 23561 - Posted: 20 Oct 2011, 2:52:32 UTC

Tom, 3D Studio Max is a program that one uses to produce three dimensional scenes or backgrounds and Maya is a program that one uses to create three dimensional people or creatures and then animate them(so they can move). A good example of these programs come in your run of the mill video games. If you can master these programs, one day you will have a income of $250,000+ per year. Stay in school and learn all you can.
ID: 23561 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23562 - Posted: 20 Oct 2011, 3:44:05 UTC - in response to Message 23561.  

>>>>>>>>>>>> DON"T FEED THE TROLL! <<<<<<<<<<<<<<<<

Moderator, please delete these off-topic posts before it gets out of hand.

And could the admins please enable the Red-X mechanism so that when one clicks the little red x below a post the comments are saved.
ID: 23562 · Report as offensive     Reply Quote
Profile Tom95134

Send message
Joined: 4 May 07
Posts: 250
Credit: 826,541
RAC: 0
Message 23563 - Posted: 20 Oct 2011, 4:00:55 UTC - in response to Message 23561.  

Tom, 3D Studio Max is a program that one uses to produce three dimensional scenes or backgrounds and Maya is a program that one uses to create three dimensional people or creatures and then animate them(so they can move). A good example of these programs come in your run of the mill video games. If you can master these programs, one day you will have a income of $250,000+ per year. Stay in school and learn all you can.

I'm retired (but still working). I assumed they had to do with image motion graphics. I was just wondering. You mintioned them but I coudn't find them in the BOINC Projects. Do they exist as Projects under BOINC? What is their URL if they do.

Thanks.

Tom
ID: 23563 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23564 - Posted: 20 Oct 2011, 5:48:31 UTC - in response to Message 23563.  

Tom,

Have you never heard of Google?

Keith, tell these clueless newbies to take their off-topic drivel somewhere else or I will.
ID: 23564 · Report as offensive     Reply Quote
Profile geo...
Avatar

Send message
Joined: 18 Sep 04
Posts: 7
Credit: 2,320,210
RAC: 0
Message 23566 - Posted: 20 Oct 2011, 6:54:41 UTC - in response to Message 23564.  

You just did--
roll-over the topic...
ID: 23566 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23570 - Posted: 20 Oct 2011, 18:10:44 UTC

This discussion is getting off topic.
-
Reminder to all, stick to the topic of the thread, if you want to discuss another problem or matter, start a new thread.

Sorry to all for being short on this, but I am extremely busy with a personal tragedy and I do not have time to play games or waste time hiding all the off topic posts at this time.
ID: 23570 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 23572 - Posted: 21 Oct 2011, 1:45:01 UTC - in response to Message 23570.  

Sorry, Keith. Pardon us.
Tullio
ID: 23572 · Report as offensive     Reply Quote
diederiks

Send message
Joined: 25 Jul 05
Posts: 19
Credit: 670,692
RAC: 0
Message 23593 - Posted: 29 Oct 2011, 14:03:10 UTC

I thought that errored or inconclusive WU's would be distributed again with higher priority and shorter deadlines to trusted hosts? I have 2 WU's that are inconclusive but are yet unsent? WU 471878 & 449758.
ID: 23593 · Report as offensive     Reply Quote
Filipe

Send message
Joined: 9 Aug 05
Posts: 36
Credit: 7,693,055
RAC: 146
Message 23594 - Posted: 29 Oct 2011, 15:06:54 UTC
Last modified: 29 Oct 2011, 15:07:06 UTC

I have a lot work unit pending the resend. Seems like they will only be re-sent when the "30000 ready to send WU" come to 0.
ID: 23594 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23595 - Posted: 29 Oct 2011, 16:00:03 UTC - in response to Message 23593.  

I thought that errored or inconclusive WU's would be distributed again with higher priority and shorter deadlines to trusted hosts? I have 2 WU's that are inconclusive but are yet unsent? WU 471878 & 449758.

They are supposed to be. There is something wrong. From the settings it should be working. I too see "unsent" tasks now days old, which means they are not getting properly into the queue or marked somehow. This certainly is not acceleration when the resends just sit there. I've noted this to Igor. At this point my suspission is some server component is out of date and the settings noted in the boinc docs are for a newer version. We've determind now that the server software is over 1 year old. Be patient, this will all get sorted out eventually.
ID: 23595 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 23596 - Posted: 29 Oct 2011, 22:01:03 UTC - in response to Message 23595.  
Last modified: 29 Oct 2011, 22:01:32 UTC

I've noted this to Igor. At this point my suspission is some server component is out of date and the settings noted in the boinc docs are for a newer version. We've determind now that the server software is over 1 year old. Be patient, this will all get sorted out eventually.

/home/boincadm/update_latest.sh

But that will probably break 10 other things. ;-)
Jord

BOINC FAQ Service
ID: 23596 · Report as offensive     Reply Quote
diederiks

Send message
Joined: 25 Jul 05
Posts: 19
Credit: 670,692
RAC: 0
Message 23597 - Posted: 30 Oct 2011, 12:43:03 UTC - in response to Message 23595.  

I yust found 2 WU's on my machine that have a shorter deadline, both are WU's that where not reported back in time, there where so posed to be resend on the 20 and 21 october, i think. WU 352551 & 363394. The WU's i reported earlyer are stil not resend, i think they will be at the end of the cue but how did these (WU 352551 & 363394) WU's get resend if the resend policy is not working correctly?
ID: 23597 · Report as offensive     Reply Quote
diederiks

Send message
Joined: 25 Jul 05
Posts: 19
Credit: 670,692
RAC: 0
Message 23598 - Posted: 30 Oct 2011, 12:46:26 UTC - in response to Message 23596.  
Last modified: 30 Oct 2011, 12:46:59 UTC

/home/boincadm/update_latest.sh

But that will probably break 10 other things. ;-)

Thats also probably why they did not update. The project yust moved to a new place and to new admins, they already posted earlyer that thay wanted to updte the server but hade to figure some things out first, most likely so that not 10 but maybe 1 or 0 thing will break.
ID: 23598 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23600 - Posted: 30 Oct 2011, 13:00:33 UTC - in response to Message 23597.  

I yust found 2 WU's on my machine that have a shorter deadline, both are WU's that where not reported back in time, there where so posed to be resend on the 20 and 21 october, i think. WU 352551 & 363394. The WU's i reported earlyer are stil not resend, i think they will be at the end of the cue but how did these (WU 352551 & 363394) WU's get resend if the resend policy is not working correctly?

It would be most helpful if you (or anyone posting) could at least give a link. Sometimes it can take considerable time to click through your account and computers to find the example. I've sometimes given up because users have too many computers to click through looking for a result.

If you check the wu, yes it was resent after a timeout, but look at the dates. there was a 10 day delay from the timeout to the resend, meaning it just got stuck in the queue. Resends are supposed to be accelerated and with a higher priority, they should go out before other work pending at lower priority, it should not take 10 days, only minutes. Adding 10 extra days is not accelerating retries as noted in the boinc docs which iswhat we are trying to accomplish.

---

As for the update, don't worry about breaking 10 things, 10 things are already broken and not fixed yet.
ID: 23600 · Report as offensive     Reply Quote
diederiks

Send message
Joined: 25 Jul 05
Posts: 19
Credit: 670,692
RAC: 0
Message 23601 - Posted: 30 Oct 2011, 13:11:40 UTC - in response to Message 23600.  

Next time i will post the link, it is indeed more easy that way. Thanx for your response!
ID: 23601 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : Long delays in jobs


©2024 CERN