Message boards : Number crunching : Server problems
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Chrulle

Send message
Joined: 27 Jul 04
Posts: 182
Credit: 1,880
RAC: 0
Message 13652 - Posted: 17 May 2006, 21:29:32 UTC

Hi everyone,

I think the server has had an power outtage and that the DB has chrashed. At least that what it looks like to me. I will contact the Cern people to tell them. I might be able to guide them through a quick fix.
But for the time being you should probably choose "no new work" as it is very doubtful any work issued in the meantime will be registered in the DB. (i.e. it is lost. ) At least until someone can take the scheduler offline.

Cheers,
Chrulle
Chrulle
Research Assistant & Ex-LHC@home developer
Niels Bohr Institute
ID: 13652 · Report as offensive     Reply Quote
Profile LucaB76 - BOINC.Italy
Avatar

Send message
Joined: 10 Feb 06
Posts: 3
Credit: 76,833
RAC: 0
Message 13654 - Posted: 17 May 2006, 21:56:31 UTC

Hi Chrulle!

Thanks for this useful post!!

Just one question:
What will happen to our science results and pending credit?
Everything is gone??

This thing happened even on (from the news page)
26.1.2006 10:02 UTC
Parts of the database had chrashed due to a power failure in the computer centre. They are now back up and running.

Is there a way to recover all the results (and credit)?
Or will we collect another bunch of "pending results"?

Thanks again!
ID: 13654 · Report as offensive     Reply Quote
Profile [B^S] ShanerX

Send message
Joined: 14 Jul 05
Posts: 41
Credit: 1,788,341
RAC: 0
Message 13656 - Posted: 18 May 2006, 3:57:39 UTC

Yes, thank you Chrulle ... it's always nice when there's a quick reply for all the happy crunchers!!

I'm not so concerned about the credit myself (ok, kidding), but moreso the overall sporadic and chaotic structure of the project since you've gone. We seem to lose interest with each incident, which is a shame because this project is actually very cool and productive.

Just my 2.277561 cents ...

ID: 13656 · Report as offensive     Reply Quote
Profile The Gas Giant

Send message
Joined: 2 Sep 04
Posts: 309
Credit: 715,258
RAC: 0
Message 13657 - Posted: 18 May 2006, 10:59:12 UTC - in response to Message 13656.  

Yes, thank you Chrulle ... it's always nice when there's a quick reply for all the happy crunchers!!

I'm not so concerned about the credit myself (ok, kidding), but moreso the overall sporadic and chaotic structure of the project since you've gone. We seem to lose interest with each incident, which is a shame because this project is actually very cool and productive.

Just my 2.277561 cents ...

Yeah, I can't agree more. If the project sponsors don't give a crap then neither will we. This is the 2nd time this year this has happened and at the time we were willing to forgive. But once bitten.....

Time for a UPS. Got a spare 2k? It will buy a beauty!
ID: 13657 · Report as offensive     Reply Quote
Nuadormrac

Send message
Joined: 26 Sep 05
Posts: 85
Credit: 421,130
RAC: 0
Message 13660 - Posted: 18 May 2006, 20:52:38 UTC
Last modified: 18 May 2006, 20:52:56 UTC

Chrulle certainly did put a lot into keeping this project up when he was here, and even a bit after he had left. Now the project people need to look into either getting someone, or doing something to keep this up themself. And yes, it would be a shame if all the WUs we had crunched from this batch got sent the way of the bit bucket or something...
ID: 13660 · Report as offensive     Reply Quote
Profile Chrulle

Send message
Joined: 27 Jul 04
Posts: 182
Credit: 1,880
RAC: 0
Message 13661 - Posted: 18 May 2006, 21:02:43 UTC

And he still is. At least until i get a job. ;-)

I have made a quick fix. Please check whether the system works again. Some credit and jobs have probably been lost, since i do not have the time to trawl through the backups.

Chrulle
Research Assistant & Ex-LHC@home developer
Niels Bohr Institute
ID: 13661 · Report as offensive     Reply Quote
Profile [B^S] ShanerX

Send message
Joined: 14 Jul 05
Posts: 41
Credit: 1,788,341
RAC: 0
Message 13662 - Posted: 18 May 2006, 21:44:29 UTC

You da man!! Thanks ... my pending credit is/has been the same, but I can now see the individual wu results (even the pending ones, which all had at least 3 turn-ins)!!

ID: 13662 · Report as offensive     Reply Quote
Nuadormrac

Send message
Joined: 26 Sep 05
Posts: 85
Credit: 421,130
RAC: 0
Message 13663 - Posted: 19 May 2006, 0:22:07 UTC
Last modified: 19 May 2006, 0:35:14 UTC

thx for the fix... My granted credit has gone up, though I still have a lot of pending. Some WUs issued multiple instances though a quorum of 3 had already been attained (probably because the WUs weren't showing), and the WUs still being crunched seems to have gone up. But it does look things are back in business and returning to a state of normallacy :)

Edit: Did notice a few glitches however. On these WU, all crunchers were granted credit, but mine is still showing "status pending"

http://lhcathome.cern.ch/result.php?resultid=6912464
http://lhcathome.cern.ch/workunit.php?wuid=1340533

http://lhcathome.cern.ch/result.php?resultid=6917176
http://lhcathome.cern.ch/workunit.php?wuid=1341476

http://lhcathome.cern.ch/result.php?resultid=6887056
http://lhcathome.cern.ch/workunit.php?wuid=1335498

http://lhcathome.cern.ch/result.php?resultid=6874966
http://lhcathome.cern.ch/workunit.php?wuid=1333094

http://lhcathome.cern.ch/result.php?resultid=6891698
http://lhcathome.cern.ch/workunit.php?wuid=1336415

http://lhcathome.cern.ch/result.php?resultid=6893817
http://lhcathome.cern.ch/workunit.php?wuid=1336812

http://lhcathome.cern.ch/result.php?resultid=6918033
http://lhcathome.cern.ch/workunit.php?wuid=1341647

http://lhcathome.cern.ch/result.php?resultid=6864012
http://lhcathome.cern.ch/workunit.php?wuid=1330903

http://lhcathome.cern.ch/result.php?resultid=6900101
http://lhcathome.cern.ch/workunit.php?wuid=1338070

http://lhcathome.cern.ch/result.php?resultid=6879542
http://lhcathome.cern.ch/workunit.php?wuid=1333993

http://lhcathome.cern.ch/result.php?resultid=6898881
http://lhcathome.cern.ch/workunit.php?wuid=1337824

The other WUs in pending, are pending for all, but these have granted credit to all other users, but have it pending for me. Not sure if the validator will go back, and grant it to me, that the dBase is repaired, or not. Those are both the WU and result pairs...
ID: 13663 · Report as offensive     Reply Quote
Profile Ocean Archer
Avatar

Send message
Joined: 13 Jul 05
Posts: 143
Credit: 263,300
RAC: 0
Message 13664 - Posted: 19 May 2006, 0:31:53 UTC

Chrulle --

Thank you for taking the time to watch over the project and keeping us informed


If I've lived this long, I've gotta be that old
ID: 13664 · Report as offensive     Reply Quote
Jayargh

Send message
Joined: 24 Oct 04
Posts: 79
Credit: 257,762
RAC: 0
Message 13665 - Posted: 19 May 2006, 3:51:22 UTC - in response to Message 13661.  
Last modified: 19 May 2006, 4:31:13 UTC

And he still is. At least until i get a job. ;-)

I have made a quick fix. Please check whether the system works again. Some credit and jobs have probably been lost, since i do not have the time to trawl through the backups.


Love the disclaimer Chrulle Ex-admin....and TY for all your attention and caring here....But this still raises the red flag for us crunchers that still needs to be addressed...ie:no active admin.....I will have more faith and continue to crunch as best as I have done over the last year and a half if I (We) get communication,feedback and responses as we have in the past...Is Ben Segal still around? Maybe an e-mail to him on the current unpleasant conditions? Perhaps a short update from a Current not ex (hehe) admin would be appropriate about now on how to keep any enthusiasm for LHC :)

[EDIT] Oh and of course there are still massive (relatively) pending credit issues,.... quorum but not validated still left, but again TY again Chrulle for un log-jamming for us [EDIT]

ID: 13665 · Report as offensive     Reply Quote
Bratwurst

Send message
Joined: 14 Jul 05
Posts: 2
Credit: 142,986
RAC: 0
Message 13667 - Posted: 19 May 2006, 11:39:13 UTC

i'm sorry for asking again the problem ...

but where are the pendings credits ... ??? ;-)
ID: 13667 · Report as offensive     Reply Quote
Profile [B^S] ShanerX

Send message
Joined: 14 Jul 05
Posts: 41
Credit: 1,788,341
RAC: 0
Message 13671 - Posted: 19 May 2006, 14:26:16 UTC
Last modified: 19 May 2006, 14:28:27 UTC

You'd have to look at your Account statistics ... you can get there from the home page, and login with your email/password.

It'll show your enlistment date, total credit, rac, and then pending credit. If you look at each computer, you can then look at the results turned in and see how many have been returned, quorum met, granted credit, etc.

Once you get to a specific Result, click on the Work Unit ID for all those details.

ID: 13671 · Report as offensive     Reply Quote
Jayargh

Send message
Joined: 24 Oct 04
Posts: 79
Credit: 257,762
RAC: 0
Message 13673 - Posted: 20 May 2006, 5:07:35 UTC

I assume after 24 hours of no response that there is no answers to my questions....or the answers are apparent by no response. I assume we are on the back-burner because the scientists have already gotten the answers they required for the installation of the magnets. I suspect as we get closer to the initialization of LHC there will be a flurry of new studies. Howevever in the meantime I am SADLY suspending LHC on most of my computers and await some response(hopefully favorable) to concerns here raised.
ID: 13673 · Report as offensive     Reply Quote
Profile [B^S] ShanerX

Send message
Joined: 14 Jul 05
Posts: 41
Credit: 1,788,341
RAC: 0
Message 13674 - Posted: 20 May 2006, 5:13:14 UTC

As dissappointing as it seems, there is still light at the end of the tunnel (no pun intended! :) )

Good news is ... other projects like Stardust@Home have seen first hand what our participation can achieve. LHC@Home better start looking at the vast (and free to them) computing power that is within their reach. For now, I'll just wait 'til next batch of wu's, and crunch for another 3 days.

http://planetary.org/programs/projects/stardustathome/update_051806.html

ID: 13674 · Report as offensive     Reply Quote
Hans Sveen

Send message
Joined: 2 Sep 04
Posts: 22
Credit: 4,047,548
RAC: 34
Message 13675 - Posted: 20 May 2006, 18:06:29 UTC

Hello!
It seems the database/server problems are solved, at least I have received some wu's this last hour! I guess these are some wu's not finnished on time or with error, hope this will clear out some of the pending credits!

Have a nice crunching weekend!


Hans Sveen
Oslo, Norway


ID: 13675 · Report as offensive     Reply Quote
Nuadormrac

Send message
Joined: 26 Sep 05
Posts: 85
Credit: 421,130
RAC: 0
Message 13676 - Posted: 21 May 2006, 0:46:26 UTC

Most of my pending had cleared prior to this recent batch of WUs comming out, with only a few exceptions...

OK, the server probably got over-loaded as people were making a run to the server to mass download WUs :D Looking good though...
ID: 13676 · Report as offensive     Reply Quote
Profile Steve Cressman
Avatar

Send message
Joined: 28 Sep 04
Posts: 47
Credit: 6,394
RAC: 0
Message 13709 - Posted: 23 May 2006, 14:44:35 UTC

The server has a problem that I don't quite understand because it does not happen with any of the other projects that I am attached to. Every time my boinc client asks for more work from LHC a new host is spawned. Everyday I have a long list of host that I need to merge. If this is happenning to other people that could explain some of the other problems the server is having.
98SE XP2500+ @ 2.1 GHz Boinc v5.8.8
ID: 13709 · Report as offensive     Reply Quote
Profile Trog Dog

Send message
Joined: 25 Nov 05
Posts: 39
Credit: 41,119
RAC: 0
Message 13711 - Posted: 23 May 2006, 15:26:22 UTC - in response to Message 13709.  

The server has a problem that I don't quite understand because it does not happen with any of the other projects that I am attached to. Every time my boinc client asks for more work from LHC a new host is spawned. Everyday I have a long list of host that I need to merge. If this is happenning to other people that could explain some of the other problems the server is having.


Same here, are you using BAM at all, and is it only affecting your linux boxes?
ID: 13711 · Report as offensive     Reply Quote
Profile [B^S] ShanerX

Send message
Joined: 14 Jul 05
Posts: 41
Credit: 1,788,341
RAC: 0
Message 13712 - Posted: 23 May 2006, 16:19:20 UTC

It happened for me on both WinXP and Win2k, with various different hardware platforms?! I actually merged ALL of them to my laptop and have been fine since.

ID: 13712 · Report as offensive     Reply Quote
Profile Steve Cressman
Avatar

Send message
Joined: 28 Sep 04
Posts: 47
Credit: 6,394
RAC: 0
Message 13713 - Posted: 23 May 2006, 17:37:53 UTC
Last modified: 23 May 2006, 17:41:10 UTC

Yes I'm using BAM on windows only. No linux boxes here but that will probably change in the not to distant future. Have not liked the direction microshaft has been going for many years!
98SE XP2500+ @ 2.1 GHz Boinc v5.8.8
ID: 13713 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Server problems


©2024 CERN