Message boards : Number crunching : Even if there's no "real work", may I suggest a "real test"?
Message board moderation

To post messages, you must log in.

AuthorMessage
EclipseHA

Send message
Joined: 18 Sep 04
Posts: 47
Credit: 1,886,234
RAC: 0
Message 17366 - Posted: 16 Jul 2007, 2:09:21 UTC

Much has changed since there was real work in the pipline - the severs have moved, looks like project directories have changed, etc...

There have been a couple of small bursts of WU's for testing, but, as many people have the resource for LHC set really high, when work is available, it goes to only a few machines.

As a result, not many of the clients have been tested, and there probably hasn't been that much of a load on the servers. Last time there was "real work" most of the WU's I got were ghosts.. (like 15 of 20)

How about using old data for a real test run before new data hits the pipline? I'm thinking on the order of 100K WUs.

Work out stuff now before real data is comprimised, is all I'm suggesting... (on both the client and server end)
ID: 17366 · Report as offensive     Reply Quote
Speedy

Send message
Joined: 28 Jul 05
Posts: 37
Credit: 504,361
RAC: 3,746
Message 17368 - Posted: 16 Jul 2007, 6:54:31 UTC - in response to Message 17366.  

Much has changed since there was real work in the pipline - the severs have moved, looks like project directories have changed, etc...

There have been a couple of small bursts of WU's for testing, but, as many people have the resource for LHC set really high, when work is available, it goes to only a few machines.

As a result, not many of the clients have been tested, and there probably hasn't been that much of a load on the servers. Last time there was "real work" most of the WU's I got were ghosts.. (like 15 of 20)

How about using old data for a real test run before new data hits the pipline? I'm thinking on the order of 100K WUs.

Work out stuff now before real data is comprimised, is all I'm suggesting... (on both the client and server end)

I agree 100% with what you are saying


Have A Crunching Good day
ID: 17368 · Report as offensive     Reply Quote
Profile caspr
Avatar

Send message
Joined: 26 Apr 06
Posts: 89
Credit: 309,235
RAC: 0
Message 17371 - Posted: 16 Jul 2007, 11:31:41 UTC

How about using old data for a real test run before new data hits the pipline? I'm thinking on the order of 100K WUs.

Work out stuff now before real data is comprimised, is all I'm suggesting... (on both the client and server end)






Yep I second that, what did they call it over at APS a couple weeks ago?...
A "BOOM" test, to see if the servers could stand the load.






A clear conscience is usually the sign of a bad memory


ID: 17371 · Report as offensive     Reply Quote
J Langley

Send message
Joined: 31 Dec 05
Posts: 68
Credit: 8,691
RAC: 0
Message 17374 - Posted: 17 Jul 2007, 11:58:57 UTC - in response to Message 17366.  

How about using old data for a real test run before new data hits the pipline? I'm thinking on the order of 100K WUs.


To what purpose? Presumably the project is issuing the WUs it feels are necessary for testing. The client hasn't changed. It has been a long time since I got any LHC WUs (because my machine doesn't run 24x7), but the last thing I want is to run a bunch of pointless WUs when I could be doing real science on another project. (Of course if Neasna and Alex think it is necessary (and not just "fun") to do a volume / stress test, I'm happy to crunch test WUs.)
ID: 17374 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 18 Sep 04
Posts: 47
Credit: 1,886,234
RAC: 0
Message 17385 - Posted: 19 Jul 2007, 4:28:30 UTC

Seems that "outstanding work" is still at ~35 days after 1000 test WU's were released. Could it be the "ghost" problem as the numbers haven't changed much today?

I think doing a real load test would be a good thing for this project. It's no more "wasted cycles" than SETI which seems to crunch the same data over and over, and infact would be good for this project.

New servers in a new location on the net, and only testing with enough work to last less than 10 minutes and takes days to come back isn't really a valid test for when a real dump of work becomes available, IMHO.

Nows the time to really test the infrastructure, and not when real data might be lost/delayed.
ID: 17385 · Report as offensive     Reply Quote
Profile Neasan
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 30 Nov 06
Posts: 234
Credit: 11,078
RAC: 0
Message 17386 - Posted: 19 Jul 2007, 11:05:25 UTC - in response to Message 17385.  

Seems that "outstanding work" is still at ~35 days after 1000 test WU's were released. Could it be the "ghost" problem as the numbers haven't changed much today?

Some of are them new WUs sent on the 16th and some of them are ones taken by users who have queued them without being able to process them in time.


To what purpose? Presumably the project is issuing the WUs it feels are necessary for testing. The client hasn't changed. It has been a long time since I got any LHC WUs (because my machine doesn't run 24x7), but the last thing I want is to run a bunch of pointless WUs when I could be doing real science on another project. (Of course if Neasan and Alex think it is necessary (and not just "fun") to do a volume / stress test, I'm happy to crunch test WUs.)

Get out of my head ;-)

Look we know you want to get crunching but this is out of our hands it is up to the scientists to submit work when they need it, they have said they are working on a large batch but when that appears we don't know.
ID: 17386 · Report as offensive     Reply Quote
Profile caspr
Avatar

Send message
Joined: 26 Apr 06
Posts: 89
Credit: 309,235
RAC: 0
Message 17462 - Posted: 20 Jul 2007, 20:08:54 UTC

NEASAN,Sorry for being a pain, BUT, Do 1 get 1 insted of get 100 do 10 seemes to be a little better for the project....I may be wrong, please correct me if I am.:0{
A clear conscience is usually the sign of a bad memory


ID: 17462 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1117
Credit: 49,723,831
RAC: 13,891
Message 17465 - Posted: 20 Jul 2007, 22:56:38 UTC
Last modified: 20 Jul 2007, 23:01:00 UTC

I didn't get anything again

And mine run 24/7 since the beginning.


This is part of what it says.........

7/20/2007 5:56:33 AM|lhcathome|Reason: To fetch work
7/20/2007 5:56:33 AM|lhcathome|Requesting 172800 seconds of new work
7/20/2007 5:56:44 AM|lhcathome|Scheduler request succeeded
7/20/2007 5:56:46 AM|lhcathome|Started download of file sixtrack_4.67_windows_intelx86.exe
7/20/2007 5:56:46 AM|lhcathome|Started download of file bottomOverlay_1.02_.tga
7/20/2007 5:56:50 AM|lhcathome|Finished download of file bottomOverlay_1.02_.tga
7/20/2007 5:56:50 AM|lhcathome|Throughput 16631 bytes/sec
7/20/2007 5:56:50 AM|lhcathome|Started download of file courier_16_bold_1.01_.tga
7/20/2007 5:56:50 AM|lhcathome|Checksum or signature error for bottomOverlay_1.02_.tga
7/20/2007 5:56:51 AM|lhcathome|Unrecoverable error for result

7/20/2007 6:02:12 AM|lhcathome|Reason: To fetch work
7/20/2007 6:02:12 AM|lhcathome|Requesting 172800 seconds of new work, and reporting 5 completed tasks
7/20/2007 6:02:17 AM|lhcathome|Scheduler request succeeded
7/20/2007 6:02:17 AM|lhcathome|Message from server: No work sent
7/20/2007 6:02:17 AM|lhcathome|Message from server: (reached daily quota of 10 results)

Just a bunch of nothing. (of course the message page is 10 times more than this)

Just "client error"

Volunteer Mad Scientist For Life
ID: 17465 · Report as offensive     Reply Quote
watnou

Send message
Joined: 1 Sep 04
Posts: 101
Credit: 1,395,204
RAC: 0
Message 17466 - Posted: 20 Jul 2007, 23:31:00 UTC

magic

if you really want an answer. DONT use that font.

ID: 17466 · Report as offensive     Reply Quote
PovAddict
Avatar

Send message
Joined: 14 Jul 05
Posts: 275
Credit: 49,291
RAC: 0
Message 17468 - Posted: 20 Jul 2007, 23:45:15 UTC

Is LHC server code still super-outdated, or have you upgraded already? My patch to the server code that limits the amount of workunits a host can have at a time has made it to the official code. It would be definitely useful here...
ID: 17468 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1117
Credit: 49,723,831
RAC: 13,891
Message 17481 - Posted: 21 Jul 2007, 6:37:54 UTC


watnou,


Actually they just appear on their own.........I think they are hungry for some LHC WU's




Volunteer Mad Scientist For Life
ID: 17481 · Report as offensive     Reply Quote
uioped1

Send message
Joined: 26 Aug 05
Posts: 18
Credit: 37,965
RAC: 0
Message 17522 - Posted: 23 Jul 2007, 17:45:10 UTC - in response to Message 17465.  

I didn't get anything again

And mine run 24/7 since the beginning.


This is part of what it says.........


7/20/2007 5:56:50 AM|lhcathome|Checksum or signature error for bottomOverlay_1.02_.tga



Magic,
Have you tried the "Skip Image File Verification" setting in the general boinc preferences?
I know it's hard to test given the shortage of WUs, but it's worth a try.

What other projects are you connected to?
ID: 17522 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1117
Credit: 49,723,831
RAC: 13,891
Message 17524 - Posted: 24 Jul 2007, 7:09:26 UTC - in response to Message 17522.  
Last modified: 24 Jul 2007, 7:10:41 UTC



Magic,
Have you tried the "Skip Image File Verification" setting in the general boinc preferences?
I know it's hard to test given the shortage of WUs, but it's worth a try.

What other projects are you connected to?



I never noticed that before so I just checked mine and it says "no"


I also saw in my "messages" several more errors after it tried to download some here.

(I also run Einstein)



Volunteer Mad Scientist For Life
ID: 17524 · Report as offensive     Reply Quote

Message boards : Number crunching : Even if there's no "real work", may I suggest a "real test"?


©2024 CERN