1) Message boards : Number crunching : All my WUs resulted invalid (Message 20721)
Posted 30 Oct 2008 by Profile [AF>Futura Sciences>Linux] Thrr-Gilag
Post:
I\'ve some wjuly_lhcboind that passed successfull, but other no.

There is no clearly OS (linux64 or win32) nor CPU architecture (Core2Duo, P4, Athlon, Opteron)NOR o/c or standard frequencies clues for such errors.

Need this batch too much precision in calculation ?
2) Message boards : Number crunching : Initial Replication (Message 18334)
Posted 22 Oct 2007 by Profile [AF>Futura Sciences>Linux] Thrr-Gilag
Post:
http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=1726535

Even 5 computers are not grouping 3 same answers...

May you consider that this project could need this IR = 5 ?

An other question : why have the scientists to wait that a WU has been deleted to take information of it ?

If the quorum is okay, why have they to wait that the 2 others answers. The result remain in database to collect the delayed returned results, but in fact, scientists could use the results at the moment the quorum is okay...

If I am right, how a IR=5 can delay the obtention of results ?
It's only consume space to keep the result in database until everyone has responsed, or passed the deadline .
3) Message boards : Number crunching : work units?? (Message 17852)
Posted 13 Sep 2007 by Profile [AF>Futura Sciences>Linux] Thrr-Gilag
Post:
you just have to be lucky i guess
only 1 machine (out of 4) was able to get some wu's
/me happy

Be happy, none of my five computers got any of them :sad:

Let some to me next time :)
4) Message boards : Number crunching : work units?? (Message 17844)
Posted 13 Sep 2007 by Profile [AF>Futura Sciences>Linux] Thrr-Gilag
Post:
Waiting ^^

Thancks for your precision.
5) Message boards : Number crunching : Initial Replication (Message 17668)
Posted 1 Aug 2007 by Profile [AF>Futura Sciences>Linux] Thrr-Gilag
Post:
LHC have real work to do, and deadline to respect. Sure you can find more efficient ways to do this work. But how many times would they lost by developing and testing such new application ?

Now there is CPU wasting time. But I thinck it's cheaper to loss CPU time that to spend reflexion time of scientific who have to afford strict deadlines.

Don't you thinck so ?
6) Message boards : Number crunching : Initial Replication (Message 17645)
Posted 31 Jul 2007 by Profile [AF>Futura Sciences>Linux] Thrr-Gilag
Post:
As an example I give you a wu from Einstein

Einstein unit

basically the wu was initially issued on 8th June but did not reach quorum until 27th July !! enought said?

The project admins/scientist are the only people who have the whole picture, we who contribute our resourses can only trust that they are choosing the correct
parameters. Thats not to say we cannot suggest what we think is better. Einstein can afford this kind of thing occasionally, LHC can't.

That's wrong. The quorum is reached the 16th when the third host send his result. So it tooks 9 days to complete the quorum.

For LHC, IR=5 doesn't create only wasting calculations.

If there is 20-25% of error computing, there is one question to ask : are those errors retorted on all the WU or localised on some "hard to compute" WU ?

If only on some hard to compute WU, the calculation of Alex is the good point of view, because you can quickly obtain a lot of compute error on one WU. If IR was 3, and you obtain 2 errors, you'll need to resend 2, you got the 5 replications and got either a lot of delay. Can LHC afford it ?

If not, that's a question of answer speed. As the project administrator said, we get job when scientists need it. Implicitely, he's saying that a short delay is needed to send back the WU. That's also why they can't / don't want to send work 24/7.

Sure there is a waste time computing. But it seems that it is the cost to pay for this project.
7) Message boards : Number crunching : Can't Access Work Units (Message 17518)
Posted 23 Jul 2007 by Profile [AF>Futura Sciences>Linux] Thrr-Gilag
Post:
OK it's fixed now, I was just not working over the weekend.

Thancks :)
8) Message boards : Number crunching : Can't Access Work Units (Message 17510)
Posted 23 Jul 2007 by Profile [AF>Futura Sciences>Linux] Thrr-Gilag
Post:
Either.


Up, 1861 workunits to crunch
3467 workunits in progress
10 concurrent connections

But


23/07/2007 09:44:12|lhcathome|Fetching scheduler list
23/07/2007 09:44:17|lhcathome|Master file download succeeded
23/07/2007 09:44:22|lhcathome|Sending scheduler request: Requested by user
23/07/2007 09:44:22|lhcathome|Requesting 8640 seconds of new work
23/07/2007 09:44:27|lhcathome|Scheduler RPC succeeded [server version 505]
23/07/2007 09:44:27|lhcathome|Deferring communication for 7 sec
23/07/2007 09:44:27|lhcathome|Reason: requested by project
23/07/2007 09:44:27|lhcathome|Deferring communication for 1 min 0 sec
23/07/2007 09:44:27|lhcathome|Reason: no work from project


What's the problem ?



©2024 CERN