Message boards : Number crunching : Server Status
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ebahapo
Avatar

Send message
Joined: 18 Sep 04
Posts: 30
Credit: 104,162
RAC: 0
Message 14694 - Posted: 20 Sep 2006, 15:42:41 UTC
Last modified: 20 Sep 2006, 15:48:18 UTC

How about adding a link to the server status page, http://lhcathome.cern.ch/server_status.php in the main page?

ID: 14694 · Report as offensive     Reply Quote
Profile littleBouncer
Avatar

Send message
Joined: 23 Oct 04
Posts: 358
Credit: 1,439,205
RAC: 0
Message 14771 - Posted: 21 Sep 2006, 10:21:49 UTC

Many Thanks augustine for the link.....:-)

greetz littleBouncer
ID: 14771 · Report as offensive     Reply Quote
[B^S] sTrey

Send message
Joined: 4 Aug 05
Posts: 11
Credit: 14,485
RAC: 0
Message 14805 - Posted: 21 Sep 2006, 17:39:22 UTC

Cool, thanks Scarecrow!
ID: 14805 · Report as offensive     Reply Quote
Profile Seventh Serenity
Avatar

Send message
Joined: 27 Sep 04
Posts: 51
Credit: 72,804
RAC: 0
Message 14806 - Posted: 21 Sep 2006, 17:41:39 UTC - in response to Message 14803.  

Just for fun, I've tossed together a page displaying graphs and stats for the LHC/Boinc servers and daemons. About the same as my Seti page, just re-tooled for LHC.
As is the case with most all graph watching, it's about as excitng as watching paint dry, so I recommend having pizza and your favorite beverage on hand if you visit the page to avoid slipping into a coma. Also, since I just started collecting the LHC data about 24 hours ago, the graphs for longer time periods (48 hours, 96 hours, 7 day, etc) won't change until enough time has passed to propagate those charts with data.... whoa... talk about something to look forward to!

LHC@Home Server Stats & Graphs

Nicely done!

ID: 14806 · Report as offensive     Reply Quote
Henry Nebrensky

Send message
Joined: 13 Jul 05
Posts: 165
Credit: 14,925,288
RAC: 34
Message 14837 - Posted: 22 Sep 2006, 21:12:20 UTC
Last modified: 22 Sep 2006, 21:12:51 UTC

How about adding a link to the server status page, http://lhcathome.cern.ch/server_status.php in the main page?


Given how much confusion we've already had caused by the bursty nature of the work here,
I think bringing back the old summary of how many WUs are available straight to the main page
would be even more useful!
ID: 14837 · Report as offensive     Reply Quote
River~~

Send message
Joined: 13 Jul 05
Posts: 456
Credit: 75,142
RAC: 0
Message 14862 - Posted: 24 Sep 2006, 19:30:00 UTC - in response to Message 14803.  

Just for fun, I've tossed together a page displaying graphs and stats for the LHC/Boinc servers and daemons. ...
LHC@Home Server Stats & Graphs


hey, thanks scarecrow this could be really interesting once a decent backlog of samples is in your database.

One suggestion I have would be to extend the max timescale to 1 year. Reason is that this project often has gaps of over 60 days in it, and therefore there will be times when your graphs do not show the most recent episode of work.

If extending the graphs from 60 to 360 days will cause a data problem I'd suggest dropping the collection rate to every 6 hours, or another idea is that with a bit more coding you could still sample every hour but only keep the max and min from each day.

it is cool as it is, just ideas for further enhancement if you want to go that way.

River~~
ID: 14862 · Report as offensive     Reply Quote
boinc

Send message
Joined: 13 Jul 05
Posts: 2
Credit: 84,413
RAC: 0
Message 14873 - Posted: 25 Sep 2006, 16:59:34 UTC - in response to Message 14837.  
Last modified: 25 Sep 2006, 17:01:07 UTC

Given how much confusion we've already had caused by the bursty nature of the work here, I think bringing back the old summary of how many WUs are available straight to the main page would be even more useful!

Agreed! A whole lot more useful than month old posts...
ID: 14873 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 14876 - Posted: 27 Sep 2006, 11:39:27 UTC

I've been monitoring the server status for this last batch of work, as the results in progress has been dropping, but the "Workunits waiting for validation" has always been 9 for the past 6 days, even as all the other numbers have been changing. Either this number is not updating properly or it indicates some other problem, something for the admin to look into.
ID: 14876 · Report as offensive     Reply Quote
River~~

Send message
Joined: 13 Jul 05
Posts: 456
Credit: 75,142
RAC: 0
Message 14878 - Posted: 27 Sep 2006, 18:13:17 UTC - in response to Message 14877.  

... the "Workunits waiting for validation" has always been 9 for the past 6 days, even as all the other numbers have been changing. ...


There is a known issue on the LHC database of some results that had their files deleted before they were validated, and this causes the unresolvable "pending credit" that a few unlucky participants have where less that a quorum were returned before the file server went loopy. For some reason it is almost impossible to purge these fro the database without risking damage to a lot of other stuff -- Chrulle did explain it once but I can't find the posting now. Anyway, although in data terms a good time to resolve this would have been in the recent long gap in work, it is also an action which incoming tech staff are even less likely to want to take on than the highly experienced Chrulle. If it ain't too broke don't fix it.

My guess (and it is only a guess) is that amongst those lost results were 9 WU where a full quorum got returned then the files deleted. If so, then 9 is the new zero. Of course, if it now drops below 9 my guess is exploded.

Glad you liked the suggestions :)

River~~
ID: 14878 · Report as offensive     Reply Quote
River~~

Send message
Joined: 13 Jul 05
Posts: 456
Credit: 75,142
RAC: 0
Message 14879 - Posted: 27 Sep 2006, 18:22:01 UTC
Last modified: 27 Sep 2006, 18:25:58 UTC

Another thing that is odd is the way work is still being validated - for example you can see a single WU at around 0800 UTC on 27-Sept. OK, so the last result of the quorum comes in, the validator fires up, validates it, and goes away again.

But the odd thing is the validator has been shown as "Not Running" on the status page all day... and I *really* don't believe that someone comes in to turn on the validator just for one WU and then turns it off again.

EDIT to add: but this could be tracked too - how about graphs showing server status (eg score the three states as 2=Running, 1=NotRunning, 0=Disabled) which might also be interesting to look back on over the past year -- in a year's time of course.
R~~
ID: 14879 · Report as offensive     Reply Quote
Profile Chrulle

Send message
Joined: 27 Jul 04
Posts: 182
Credit: 1,880
RAC: 0
Message 14882 - Posted: 28 Sep 2006, 10:14:20 UTC

Hi everyone,

I am in town for a conference and helped out a bit.

I have put up the old server status. The XML stats dump should also be available after db_dump runs tonight. A news item on status of the project is forthcoming (by tomorrow latest).

Cheers,
Chrulle


Chrulle
Research Assistant & Ex-LHC@home developer
Niels Bohr Institute
ID: 14882 · Report as offensive     Reply Quote
River~~

Send message
Joined: 13 Jul 05
Posts: 456
Credit: 75,142
RAC: 0
Message 14885 - Posted: 28 Sep 2006, 11:47:15 UTC - in response to Message 14882.  


I am in town for a conference and helped out a bit.



:)
ID: 14885 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 1
Message 14886 - Posted: 28 Sep 2006, 11:59:06 UTC - in response to Message 14885.  

Quick,
Somebody kidnap Chrulle and don't let him go until he's fixed all the problems. :o)
ID: 14886 · Report as offensive     Reply Quote
watnou

Send message
Joined: 1 Sep 04
Posts: 101
Credit: 1,395,204
RAC: 0
Message 14888 - Posted: 28 Sep 2006, 12:50:20 UTC - in response to Message 14886.  

lol

but its good to see the serverstatus again
thnx chrulle

Quick,
Somebody kidnap Chrulle and don't let him go until he's fixed all the problems. :o)


ID: 14888 · Report as offensive     Reply Quote
River~~

Send message
Joined: 13 Jul 05
Posts: 456
Credit: 75,142
RAC: 0
Message 14889 - Posted: 28 Sep 2006, 18:31:25 UTC - in response to Message 14886.  

Quick,
Somebody kidnap Chrulle and don't let him go until he's fixed all the problems. :o)


and made up a few million WU ;)
ID: 14889 · Report as offensive     Reply Quote
Tobie

Send message
Joined: 5 Sep 06
Posts: 4
Credit: 3,124
RAC: 0
Message 14912 - Posted: 1 Oct 2006, 2:48:42 UTC
Last modified: 1 Oct 2006, 2:50:48 UTC

Work units were available and although Boinc Manager has been contacting the server regularly, nothing was downloaded.

Is this a problem on my side? I am attached, but not sure if the URL is correct ... http://lhcathome.cern.ch/?

ID: 14912 · Report as offensive     Reply Quote
River~~

Send message
Joined: 13 Jul 05
Posts: 456
Credit: 75,142
RAC: 0
Message 14913 - Posted: 1 Oct 2006, 3:08:35 UTC - in response to Message 14912.  
Last modified: 1 Oct 2006, 3:31:55 UTC

Work units were available and although Boinc Manager has been contacting the server regularly, nothing was downloaded.

Is this a problem on my side? I am attached, but not sure if the URL is correct ... http://lhcathome.cern.ch/?



This is common. When work is low it is a matter of luck who gets it - your client will only ask from time to time and if the work comes in and goes out in between your client's requests then you are unlucky.

Scarecrow's graph shows no data up to 10pm UTC last night during the database problem. Once that was cured, by 11pm when his software next looked all 8000 available results had been issued. I have eleven boxes all asking for LHC work, and got none at all in the recent batch, and so far none again this time :(

btw: thanks to Scarecrow for breaking the line on the graph to show missing data: a nice touch that not all programmers would have thought of.

Don't be misled by the figure of 43 results available on the status page, this figure is also only updated at intervals. In my experience anythingless than 1000 here means that you may have missed out.

Yes, your URL is correct. If your client shows lhcathome in the list of projects then you are correctly attached. If you are successfully getting work from other projects then everything is probably OK.

There are two ways you might be stopping yourself getting work, and these can be checked in the list of projects. (Excuse me if I am telling you what you already know, I am not sure how experienced you are with the BOINC manager.)

If you see the message "Won't get new tasks" in the same line as lhcathome then you need to enable work by highlighting that line and clicking "Allow new tasks".

If you see the message "Suspended by user" then you need to click "Resume".

Good Luck!
River~~
ID: 14913 · Report as offensive     Reply Quote
Andreas

Send message
Joined: 2 Aug 05
Posts: 33
Credit: 2,328,412
RAC: 16
Message 14914 - Posted: 1 Oct 2006, 7:15:18 UTC

The server status box on the front page tells us the project is "low on work". What does this mean? Does it mean

"Some workunits left, hurry hurry! 10, 9, 8,..."

or maybe:

"Temporarily out of work, but more is coming soon"

or does it mean the usual (my favourite):

"Out of work, but we may have some more in six months, in case we haven't forgotten about the project."
ID: 14914 · Report as offensive     Reply Quote
River~~

Send message
Joined: 13 Jul 05
Posts: 456
Credit: 75,142
RAC: 0
Message 14915 - Posted: 1 Oct 2006, 7:44:03 UTC - in response to Message 14914.  

The server status box on the front page tells us the project is "low on work". What does this mean? Does it mean

"Some workunits left, hurry hurry! 10, 9, 8,..."

or maybe:

"Temporarily out of work, but more is coming soon"

or does it mean the usual (my favourite):

"Out of work, but we may have some more in six months, in case we haven't forgotten about the project."


There is a serious point here, underneath the humour.

This message appears instead of the actual number when the number falls below a certain value. There are two reasons why it is unhelpful to show the actual number for small values. Firstly the count is only updated at intervals, and by the time you see the message there may be no usable units left. This reason would be valid even on a perfect system.

Secondly there are a few "stuck" work units in the system, that can never be issued nor deleted. This is a legacy of old bugs, but while the bugs that caused these WU are (hopefully) resolved, actually deleting the ghosts from the database without damaging anything else is more tricky than it sounds. Chrulle would not take it on when he was officially here, so I am not surprised the incoming admins would rather leave it alone. Some ghosts are completed work (I have about a dozen in my results with dates from 2005 to mid 2006), others are part of workunits permanently pending (to the irritation of those users so afflicted) and there have in the past been some ghosts pretending to wait to be sent. So the "low on work" message prevents newcomers from expecting that the number will fall back to zero when there is no work being sent out. This reason is a human-friendly work-around to avoid confusing those who do not want a long explanation like this one... oops, that's spoilt it ;-)

R~~


ID: 14915 · Report as offensive     Reply Quote
Perle
Avatar

Send message
Joined: 25 Oct 04
Posts: 83
Credit: 77,825,100
RAC: 39,335
Message 14916 - Posted: 1 Oct 2006, 7:46:17 UTC

I call 'dibs' on those 43.

Wouldnt have been more appropriate for it to have been *42* ?


ID: 14916 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Server Status


©2024 CERN