Message boards :
Number crunching :
Server Status
Message board moderation
Author | Message |
---|---|
Send message Joined: 18 Sep 04 Posts: 30 Credit: 104,162 RAC: 0 |
How about adding a link to the server status page, http://lhcathome.cern.ch/server_status.php in the main page? ![]() |
![]() ![]() Send message Joined: 23 Oct 04 Posts: 358 Credit: 1,439,205 RAC: 0 |
|
Send message Joined: 4 Aug 05 Posts: 11 Credit: 14,485 RAC: 0 |
Cool, thanks Scarecrow! |
![]() ![]() Send message Joined: 27 Sep 04 Posts: 51 Credit: 72,804 RAC: 0 |
Just for fun, I've tossed together a page displaying graphs and stats for the LHC/Boinc servers and daemons. About the same as my Seti page, just re-tooled for LHC. Nicely done! |
Send message Joined: 13 Jul 05 Posts: 169 Credit: 15,000,737 RAC: 0 ![]() ![]() |
How about adding a link to the server status page, http://lhcathome.cern.ch/server_status.php in the main page? Given how much confusion we've already had caused by the bursty nature of the work here, I think bringing back the old summary of how many WUs are available straight to the main page would be even more useful! |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
Just for fun, I've tossed together a page displaying graphs and stats for the LHC/Boinc servers and daemons. ... hey, thanks scarecrow this could be really interesting once a decent backlog of samples is in your database. One suggestion I have would be to extend the max timescale to 1 year. Reason is that this project often has gaps of over 60 days in it, and therefore there will be times when your graphs do not show the most recent episode of work. If extending the graphs from 60 to 360 days will cause a data problem I'd suggest dropping the collection rate to every 6 hours, or another idea is that with a bit more coding you could still sample every hour but only keep the max and min from each day. it is cool as it is, just ideas for further enhancement if you want to go that way. River~~ ![]() |
Send message Joined: 13 Jul 05 Posts: 2 Credit: 84,413 RAC: 0 |
Given how much confusion we've already had caused by the bursty nature of the work here, I think bringing back the old summary of how many WUs are available straight to the main page would be even more useful! Agreed! A whole lot more useful than month old posts... |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 209 Credit: 1,482,496 RAC: 0 |
I've been monitoring the server status for this last batch of work, as the results in progress has been dropping, but the "Workunits waiting for validation" has always been 9 for the past 6 days, even as all the other numbers have been changing. Either this number is not updating properly or it indicates some other problem, something for the admin to look into. |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
... the "Workunits waiting for validation" has always been 9 for the past 6 days, even as all the other numbers have been changing. ... There is a known issue on the LHC database of some results that had their files deleted before they were validated, and this causes the unresolvable "pending credit" that a few unlucky participants have where less that a quorum were returned before the file server went loopy. For some reason it is almost impossible to purge these fro the database without risking damage to a lot of other stuff -- Chrulle did explain it once but I can't find the posting now. Anyway, although in data terms a good time to resolve this would have been in the recent long gap in work, it is also an action which incoming tech staff are even less likely to want to take on than the highly experienced Chrulle. If it ain't too broke don't fix it. My guess (and it is only a guess) is that amongst those lost results were 9 WU where a full quorum got returned then the files deleted. If so, then 9 is the new zero. Of course, if it now drops below 9 my guess is exploded. Glad you liked the suggestions :) River~~ |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
Another thing that is odd is the way work is still being validated - for example you can see a single WU at around 0800 UTC on 27-Sept. OK, so the last result of the quorum comes in, the validator fires up, validates it, and goes away again. But the odd thing is the validator has been shown as "Not Running" on the status page all day... and I *really* don't believe that someone comes in to turn on the validator just for one WU and then turns it off again. EDIT to add: but this could be tracked too - how about graphs showing server status (eg score the three states as 2=Running, 1=NotRunning, 0=Disabled) which might also be interesting to look back on over the past year -- in a year's time of course. R~~ |
![]() Send message Joined: 27 Jul 04 Posts: 182 Credit: 1,880 RAC: 0 |
Hi everyone, I am in town for a conference and helped out a bit. I have put up the old server status. The XML stats dump should also be available after db_dump runs tonight. A news item on status of the project is forthcoming (by tomorrow latest). Cheers, Chrulle Chrulle Research Assistant & Ex-LHC@home developer Niels Bohr Institute |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
:) |
![]() ![]() Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,866,264 RAC: 0 ![]() ![]() |
Quick, Somebody kidnap Chrulle and don't let him go until he's fixed all the problems. :o) ![]() |
Send message Joined: 1 Sep 04 Posts: 101 Credit: 1,395,204 RAC: 0 |
lol but its good to see the serverstatus again thnx chrulle Quick, |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
Quick, and made up a few million WU ;) |
Send message Joined: 5 Sep 06 Posts: 4 Credit: 3,124 RAC: 0 |
Work units were available and although Boinc Manager has been contacting the server regularly, nothing was downloaded. Is this a problem on my side? I am attached, but not sure if the URL is correct ... http://lhcathome.cern.ch/? |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
Work units were available and although Boinc Manager has been contacting the server regularly, nothing was downloaded. This is common. When work is low it is a matter of luck who gets it - your client will only ask from time to time and if the work comes in and goes out in between your client's requests then you are unlucky. Scarecrow's graph shows no data up to 10pm UTC last night during the database problem. Once that was cured, by 11pm when his software next looked all 8000 available results had been issued. I have eleven boxes all asking for LHC work, and got none at all in the recent batch, and so far none again this time :( btw: thanks to Scarecrow for breaking the line on the graph to show missing data: a nice touch that not all programmers would have thought of. Don't be misled by the figure of 43 results available on the status page, this figure is also only updated at intervals. In my experience anythingless than 1000 here means that you may have missed out. Yes, your URL is correct. If your client shows lhcathome in the list of projects then you are correctly attached. If you are successfully getting work from other projects then everything is probably OK. There are two ways you might be stopping yourself getting work, and these can be checked in the list of projects. (Excuse me if I am telling you what you already know, I am not sure how experienced you are with the BOINC manager.) If you see the message "Won't get new tasks" in the same line as lhcathome then you need to enable work by highlighting that line and clicking "Allow new tasks". If you see the message "Suspended by user" then you need to click "Resume". Good Luck! River~~ ![]() |
Send message Joined: 2 Aug 05 Posts: 33 Credit: 2,329,729 RAC: 0 ![]() ![]() |
The server status box on the front page tells us the project is "low on work". What does this mean? Does it mean "Some workunits left, hurry hurry! 10, 9, 8,..." or maybe: "Temporarily out of work, but more is coming soon" or does it mean the usual (my favourite): "Out of work, but we may have some more in six months, in case we haven't forgotten about the project." ![]() |
Send message Joined: 13 Jul 05 Posts: 456 Credit: 75,142 RAC: 0 |
The server status box on the front page tells us the project is "low on work". What does this mean? Does it mean There is a serious point here, underneath the humour. This message appears instead of the actual number when the number falls below a certain value. There are two reasons why it is unhelpful to show the actual number for small values. Firstly the count is only updated at intervals, and by the time you see the message there may be no usable units left. This reason would be valid even on a perfect system. Secondly there are a few "stuck" work units in the system, that can never be issued nor deleted. This is a legacy of old bugs, but while the bugs that caused these WU are (hopefully) resolved, actually deleting the ghosts from the database without damaging anything else is more tricky than it sounds. Chrulle would not take it on when he was officially here, so I am not surprised the incoming admins would rather leave it alone. Some ghosts are completed work (I have about a dozen in my results with dates from 2005 to mid 2006), others are part of workunits permanently pending (to the irritation of those users so afflicted) and there have in the past been some ghosts pretending to wait to be sent. So the "low on work" message prevents newcomers from expecting that the number will fall back to zero when there is no work being sent out. This reason is a human-friendly work-around to avoid confusing those who do not want a long explanation like this one... oops, that's spoilt it ;-) R~~ ![]() |
![]() Send message Joined: 25 Oct 04 Posts: 83 Credit: 89,890,529 RAC: 22,703 ![]() ![]() |
I call 'dibs' on those 43. Wouldnt have been more appropriate for it to have been *42* ? ![]() |
©2025 CERN