Message boards :
Number crunching :
New server stuff
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 3 Sep 04 Posts: 212 Credit: 4,545 RAC: 0 |
Some changes on the server side: -more effective job transfer system from physicists to server -download & upload directory hierarchy enabled -each result must match with two others (better validator is still on todo-list) The changes shouldn't affect users in any way... But as some changes to server code and configuration were needed, bad server behaviour may occur. Download errors (MD5 checksum failure) and fortran errors are still too common, but no solutions to fix them exist at the time being. Good news is that there are about 100000 work units getting ready to be crunched. This means that about 400000 results will be generated. Bad news is that servers will be on high load for some time... Let's hope they can hold it. Markku Degerholm LHC@home Admin |
![]() Send message Joined: 27 Sep 04 Posts: 282 Credit: 1,415,417 RAC: 0 |
My Opteron running win 2003 server beta x64 gets this error... i don't know whether it has to do with boinc 4.13 or the current high server load... maybe someone can help me? LHC@home - 2004-10-16 02:09:16 - Master file fetch failed |
![]() Send message Joined: 3 Sep 04 Posts: 212 Credit: 4,545 RAC: 0 |
> My Opteron running win 2003 server beta x64 gets this error... i don't know > whether it has to do with boinc 4.13 or the current high server load... maybe > someone can help me? > > LHC@home - 2004-10-16 02:09:16 - Master file fetch failed Does it repeat? I mean, are you able to get work at all? Markku Degerholm LHC@home Admin |
Send message Joined: 1 Sep 04 Posts: 137 Credit: 1,769,043 RAC: 12 |
I'm seeing some download errors but I think they are just a sign of high server load. It took about 10 minutes to download all the work my client had requested and some of them failed. example: LHC@home - 2004-10-15 19:44:41 - Giving up on download of v64lhc1000proten-45s8_1056.48_1_sixvf_72.zip: Downloaded file had wrong size: expected 276996, got 0 LHC@home - 2004-10-15 19:44:41 - MD5 computation error for v64lhc1000proten-45s8_1056.48_1_sixvf_72.zip: -108 LHC@home - 2004-10-15 19:44:41 - Checksum or signature error for v64lhc1000proten-45s8_1056.48_1_sixvf_72.zip LHC@home - 2004-10-15 19:44:41 - Unrecoverable error for result v64lhc1000proten-45s8_1056.48_1_sixvf_72_3 (WU download error: couldn't get input files: v64lhc1000proten-45s8_1056.48_1_sixvf_72.zip: MD5 computation error) I'm not sure why BOINC doesn't time out and try again later. Maybe the server is actually opening a connection but then not sending any data so the client thinks it has received something and tries to handle it like a successful download. I got 42 work units. Maybe 6 or 7 failed. Didn't keep exact count :) -------------------------------------- A member of The Knights Who Say Ni! My BOINC stats site |
Send message Joined: 2 Sep 04 Posts: 9 Credit: 316,683 RAC: 0 |
> -each work unit must match with two others (better validator is still on > todo-list) i hope that means credit system has been changed from "lowest of first two results" to "middle of three". (or maybe something like "middle of first three", since new WU's are sent to four users) |
![]() Send message Joined: 22 Sep 04 Posts: 3 Credit: 5,355 RAC: 0 |
|
![]() Send message Joined: 27 Sep 04 Posts: 282 Credit: 1,415,417 RAC: 0 |
> > My Opteron running win 2003 server beta x64 gets this error... i don't > know > > whether it has to do with boinc 4.13 or the current high server load... > maybe > > someone can help me? > > > > LHC@home - 2004-10-16 02:09:16 - Master file fetch failed > > Does it repeat? I mean, are you able to get work at all? > > > > > Markku Degerholm > LHC@home Admin > I added S@H and CPN to Boinc and they didn't download anything at all... so I re-installed boinc from scratch. Since there were no WU's left to crunch.... now i have a few wu's :-) Thanks |
Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0 |
Duplicate post - sorry! |
Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0 |
>> Bad news is that servers will be on high load for some time... Let's hope they can hold it. Oops - several database connection limit messages in the last few minutes... 09:32 UTC, @439 Giskard - the first telepathic robot. |
![]() Send message Joined: 1 Sep 04 Posts: 55 Credit: 21,297 RAC: 0 ![]() |
|
Send message Joined: 2 Sep 04 Posts: 352 Credit: 1,748,908 RAC: 0 ![]() ![]() |
It took about 10 minutes to download all the work my client had requested and some of them failed. ========== It took me about an hour to download I guess 100 WU's to 2 different PC's, with cable no less, 40 of them failed to Download though ... 40 out of 200, I guess thats an acceptable rate of failure, at least I got 160 of them through the pipeline anyway ... :) |
Send message Joined: 30 Sep 04 Posts: 21 Credit: 50,260 RAC: 0 |
Do all of these PHP connections use persistant connections? If not, why not? |
![]() Send message Joined: 27 Sep 04 Posts: 282 Credit: 1,415,417 RAC: 0 |
> Some changes on the server side: > Good news is that there are about 100000 work units getting ready to be > crunched. This means that about 400000 results will be generated. Bad news is > that servers will be on high load for some time... Let's hope they can hold > it. > > Markku Degerholm > LHC@home Admin > Dear Markku. I wonder whether LHC will ever be able to generate enough WU's (no matter how much cpu power you put on the server side, there will be much more on the clients side) to have the project run with 50.000 or 500.000 participants. We've seen 400.000 results to be generated and they were downloaded in less than 1 day if i'm not mistaken. I hope you guys have more work ( speaking in terms of WU's ) for us in the next 2 years than we will able to process! ;-) Or at least a matching amount! ;-) Greetings, Thorsten |
©2025 CERN