Message boards : Number crunching : ADMIN! - Project Down Errors & Message Board Errors
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Ian Thompson

Send message
Joined: 18 Sep 04
Posts: 35
Credit: 60,866
RAC: 0
Message 7487 - Posted: 7 May 2005, 9:35:30 UTC

Hi

Recently project has been running well.

but today I have notice a large number of errors in the message logs relating to this project

mainly
Project Down or
No schedyuulaers available


Also I frequently get the following error try to acces and post this message

Warning: mysql_pconnect(): Too many connections in /shift/lxfsrk4101/data01/projects/lhcathome/html/inc/db.inc on line 16
Unable to connect to database - please try again later Error: 1040Too many connections



<img border="0" src="http://boinc.mundayweb.com/one/stats.php?userID=2104" />
ID: 7487 · Report as offensive     Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 27 Sep 04
Posts: 36
Credit: 29,315
RAC: 0
Message 7488 - Posted: 7 May 2005, 9:51:52 UTC

Same here... :/
greetz, Uli

ID: 7488 · Report as offensive     Reply Quote
Profile ksba

Send message
Joined: 27 Sep 04
Posts: 40
Credit: 1,742,415
RAC: 0
Message 7489 - Posted: 7 May 2005, 11:55:09 UTC - in response to Message 7488.  

My twenty fast pc has tonnes of results to report, can't .. they want new work, no chance ...

> Same here... :/

Ulich, great Avatar Picture :-) !
ID: 7489 · Report as offensive     Reply Quote
Profile Markku Degerholm

Send message
Joined: 3 Sep 04
Posts: 212
Credit: 4,545
RAC: 0
Message 7490 - Posted: 7 May 2005, 11:56:05 UTC

Yes, we noticed that too. We are trying to improve the situation.

Markku Degerholm
LHC@home admin
ID: 7490 · Report as offensive     Reply Quote
Profile ksba

Send message
Joined: 27 Sep 04
Posts: 40
Credit: 1,742,415
RAC: 0
Message 7491 - Posted: 7 May 2005, 12:08:46 UTC - in response to Message 7490.  

> Yes, we noticed that too. We are trying to improve the situation.
>

Oki, if you don't sleep, i'm happy to know that some of you work on the problem. I'll try it "later" to report and let you work. In this time i can save some energy
ID: 7491 · Report as offensive     Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 27 Sep 04
Posts: 36
Credit: 29,315
RAC: 0
Message 7493 - Posted: 7 May 2005, 12:33:45 UTC - in response to Message 7490.  

> Yes, we noticed that too. We are trying to improve the situation.
>

It partially works now :)
I was just able to report my results but the server claims there is no work available. I presume you stopped the server giving out work to clear the backlog.
Good job!
greetz, Uli

ID: 7493 · Report as offensive     Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 27 Sep 04
Posts: 36
Credit: 29,315
RAC: 0
Message 7494 - Posted: 7 May 2005, 12:36:19 UTC - in response to Message 7489.  
Last modified: 7 May 2005, 12:36:41 UTC

> Ulich, great Avatar Picture :-) !
>

Thanks, it's mainly from the german Firefox-board (Gecko-Engine) but i like it so much, i now use it for all my accounts ;)
greetz, Uli

ID: 7494 · Report as offensive     Reply Quote
Profile Logan5@SETI.USA
Avatar

Send message
Joined: 30 Sep 04
Posts: 112
Credit: 104,059
RAC: 0
Message 7500 - Posted: 7 May 2005, 19:01:29 UTC
Last modified: 7 May 2005, 21:21:03 UTC

People:

There's really no need to bring <B><I>every</B></I> small project related outage or error to the Admins attention each time it happens.....

Specific personal client errors & issues or specific credit problems/weirdness should ALWAYS be reported as they happen regardless of the state of the overall project.

Why?

Well, "The Powers that Be" might already be fully aware of a project wide situation by the time a person posts a thread about it because with over 5,000 of us participating in the beta and only 3 or 4 Admins, plus some Devs working on the back end, we outnumber them greatly so there's a very good chance someone else (closer to the admins or devs then you and me) has already beat you to it or they've seen it themselves and are working on a fix as quick as they can.

OR

The project wide errors you're experiencing just *might* be the result of them trying to fix something that someone else has complained about being broken and will resolve themselves in due course.


After all I don't think I need to remind anyone that this project is still in BETA, and as such has zero guarantee of stability while problems are being addressed.

Patience grasshopers.......patience.
:)

ID: 7500 · Report as offensive     Reply Quote
Gaspode the UnDressed

Send message
Joined: 1 Sep 04
Posts: 506
Credit: 118,619
RAC: 0
Message 7513 - Posted: 8 May 2005, 4:59:39 UTC - in response to Message 7500.  

> People:
>
> There's really no need to bring <B><I>every</B></I> small project related
> outage or error to the Admins attention each time it happens.....
>
> Specific personal client errors & issues or specific credit
> problems/weirdness should ALWAYS be reported as they happen regardless of the
> state of the overall project.
>

If everyone took this view then nobody would report anything. Not very useful for a project still in beta.

Gaspode the UnDressed
http://www.littlevale.co.uk
ID: 7513 · Report as offensive     Reply Quote
Profile Schatzalp Davos Switzerland

Send message
Joined: 18 Sep 04
Posts: 5
Credit: 1,030,795
RAC: 0
Message 7516 - Posted: 8 May 2005, 7:08:01 UTC

> People:
>
> There's really no need to bring every small project related
> outage or error to the Admins attention each time it happens.....

So let's report only 'big' outages...

<img src="http://www.boincsynergy.com/images/stats/comb-1321.jpg"></img>
ID: 7516 · Report as offensive     Reply Quote
Profile Alex

Send message
Joined: 2 Sep 04
Posts: 378
Credit: 10,765
RAC: 0
Message 7517 - Posted: 8 May 2005, 7:15:24 UTC

There's always the help desk.

http://lhcathome.cern.ch/forum_help_desk.php

The Predictor project created a separate 'help desk' folder which is visible on their main message board page.

I'm not the LHC Alex. Just a number cruncher like everyone else here.
ID: 7517 · Report as offensive     Reply Quote
Profile Logan5@SETI.USA
Avatar

Send message
Joined: 30 Sep 04
Posts: 112
Credit: 104,059
RAC: 0
Message 7533 - Posted: 8 May 2005, 17:30:50 UTC
Last modified: 8 May 2005, 17:40:27 UTC

ok....let's try this again for the people who don't seem to understand what it is I am trying to say.

If 1000 people report the exact same problem that obviously affects <I><B>the entire project</B></I> then that is:

a) Duplicating and wasting effort by the 6th poster to the 1000th poster, as the problem has already been reported/posted by the first 5 people to do so.

b) Taking an admins time away from resolving the problem (or other more serious problems) when they have to try and reply to each duplicate post on the same subject.

Take a look through these forums, and you'll find many examples of multiple people panicking & reporting at different times that the database is down or the scheduler is down or the message boards are not working...etc...etc....etc.....
Well, obviously, if the database/scheduler/whatever is down for you, then there's a very good chance it's down for everyone....that's called common sense.

A lot of this has to do with the fact that many of the over 5000 active beta participants do not have any formal training or experience in beta testing. They are just "average users" who have been given an opportunity to test, and that can be both a benefit and sometimes can lead to 'complications' for the admins & devs.

Like I said in my original post:

<B><I>"The Powers that Be" might already be fully aware of a project wide situation by the time a person posts a thread about it because with over 5,000 of us participating in the beta and only 3 or 4 Admins, plus some Devs working on the back end, we outnumber them greatly so there's a very good chance someone else (closer to the admins or devs then you and me) has already beat you to it or they've seen it themselves and are working on a fix as quick as they can.

OR

The project wide errors you're experiencing just *might* be the result of them trying to fix something that someone else has complained about being broken and will resolve themselves in due course.</B></I>

If some people get offended by my speaking the truth then so be it....but no one can argue that this project has a small staff, and a smaller budget and resources then many other Distributed Computing projects, and I personally would like to see the Admins and Devs focused on getting this project out of beta and live as quickly as possible instead of having to say "Yes, we noticed that too." to every person who happens to start a new thread about a project wide problem, WITHOUT first checking to see if someone else has ALREADY posted about the same problem before them.

<B>I really like the idea of a centralized Help Desk where people could report project wide problems instead of mixing them up all up under the "Number Crunching" forum....


ID: 7533 · Report as offensive     Reply Quote
Gaspode the UnDressed

Send message
Joined: 1 Sep 04
Posts: 506
Credit: 118,619
RAC: 0
Message 7534 - Posted: 8 May 2005, 19:19:45 UTC

>>ok....let's try this again

Why?
Gaspode the UnDressed
http://www.littlevale.co.uk
ID: 7534 · Report as offensive     Reply Quote
Profile Logan5@SETI.USA
Avatar

Send message
Joined: 30 Sep 04
Posts: 112
Credit: 104,059
RAC: 0
Message 7535 - Posted: 8 May 2005, 21:03:51 UTC - in response to Message 7534.  

> >>ok....let's try this again
>
> Why?
>

Yes, lets try this again for the 3rd time... :/

Let's say for example, Mike W is one of 4 or 5 admins for a project similar to this one and the project has let's say....about 5000 current active testers.

Let's say that it's Mike W's job to answer all website forum postings and inquiries in a timely and polite manner in addition to his regular duties which include website database maintenance and other website 'back end' duties as assigned by the Devs.

Let's say that one day your website results database get's badly corrupted, crashes, goes down for whatever reason and the project forums and connection to the project server is lost.

As you are quickly working to restore the database & fix the problem, you also have to answer multiple panicked posts about the crash & resulting downtime/non-normal operation and explain why it happened when you might not even know the answer yet.

All the time you are spending answering and explaining things over and over and over again takes away from the time you need to quickly get the database up and running, which in turn makes the outage longer and increases the numbers of repetitive "why?" postings which you have to address 1-1 lest you appear to be rude and not communicating.....

What to do??? Do you either:
1) Personally answer everyone no matter if your answer to their problem has already been posted multiple times and fix the server-side problem when you have time?
2) Answer only a few and then ignore everyone else so you can get back to fixing the problem, hoping that people will at least look to see if what they will be posting already has been?
3) Ignore everyone, fix the problem as quickly as possible and then have to deal with the endless threads about "lack of consistent communication"?

Do you see the problem yet? I hope so and that you are not being purposefully difficult about this only because you dsagree with me.

THE SOLUTION: Like Alex has mentioned already in this thread, the LHC Help Desk would be an <B><I>excellent place</B></I> to corral all those multiple posts while keeping the forums "free" of all that clutter so they can be utilized for their original purposes. This forum, according to it's description is for: "Credit, leaderboards, CPU performance" and doesn't say anything about user reporting of system wide problems, or getting technicial assistance for any of the topics that are expressed here on a regular basis.

Perhaps the fault is that the HelpDesk is buried and not really accessable from here: http://lhcathome.cern.ch/forum_index.php other then a small link up at the top of the page that people when concerned or in a hurry to report something, may miss entirely.

Another factor is that (for right or wrong) the admins here have encouraged the "post anything anywhere" environment by not really encouraging/enforcing the use of the Help Desk as the appropiate place to make certain types of posts. Maybe this will change once the project goes live, but as an example, you don't see the same level of posting "freedom" in other DC projects....(cough..einstein...cough..seti) as you do here.

I also have been a beta tester of Software (besides BOINC) for many years so I do feel that I am qualified to say these things based on my experience.

While I appreciate your sentiments, I really don't think that you are seeing the "bigger picture" with respect to lax forum rules....Just wait until this project goes live and get's 10's or 100's of thousands of new recruits who are all wide eyed and eager to report the same things all the time....the headaches for the admins will increase at least 10x what they are now with only a little over 5k Beta Testers...and unless CERN hires many more forum admins, the ones here WILL be hopelessly overwhelmed by the ratio of users to admins.

I apologize if whatt I am saying has "ruffled the feathers" of people who are used to the status quo, but unless changes are implemented now, during the beta period to address issues like this, this place will be chaos incarnate when the eventual tsunami of new project members appears.


ID: 7535 · Report as offensive     Reply Quote
Profile sysfried

Send message
Joined: 27 Sep 04
Posts: 282
Credit: 1,415,417
RAC: 0
Message 7538 - Posted: 8 May 2005, 22:18:54 UTC - in response to Message 7535.  

> and unless CERN hires many more forum
> admins, the ones here WILL be hopelessly overwhelmed by the ratio of users to
> admins.
>

I like your post, Logan 5, but you spent too much time on that reply....

Now my question:
Where can I apply for that job? Will I be paid in LHC turns? WU's? Credits? or plain boring money? ;-)

Cheers,

Sysfried
ID: 7538 · Report as offensive     Reply Quote
Profile Logan5@SETI.USA
Avatar

Send message
Joined: 30 Sep 04
Posts: 112
Credit: 104,059
RAC: 0
Message 7539 - Posted: 8 May 2005, 22:31:06 UTC - in response to Message 7538.  

> I like your post, Logan 5, but you spent too much time on that reply....
>
>
> Cheers,
>
> Sysfried
>

I guess it takes some longer then others to understand things sometimes.... :/
Thank you for the support.
:)
ID: 7539 · Report as offensive     Reply Quote
Profile Markku Degerholm

Send message
Joined: 3 Sep 04
Posts: 212
Credit: 4,545
RAC: 0
Message 7540 - Posted: 8 May 2005, 22:52:59 UTC - in response to Message 7538.  

>
> Now my question:
> Where can I apply for that job? Will I be paid in LHC turns? WU's? Credits? or
> plain boring money? ;-)

Everybody answering to questions of other users are already doing the job:) The pay comes in form of good will and knowledge of sharing knowledge. :)

And I'm happy as long as people don't fight/flame others or discuss totally unrelated matters here. That is, as long as I don't need to delete any posts, all goes well. :)

Markku Degerholm
LHC@home admin
ID: 7540 · Report as offensive     Reply Quote
Ian Thompson

Send message
Joined: 18 Sep 04
Posts: 35
Credit: 60,866
RAC: 0
Message 7544 - Posted: 9 May 2005, 8:18:55 UTC

Hi

First of all I apolagise if creating this thread was wrong.

But I would like to say;

I did not know their was a help desk forum.

As commented in this thread reporting issues can cause administration problems.

I would suggest 2 new forums:

A help desk - Area for client issues and support relating to installing and running the client software

Service Report - Where issues of connectivity and availability can be posted.

If services are known they should be placed in a news type system seperate from the main news. If this was readily accesiable people might not report the fault twice.

Secondly

I have investigated my workunits and downloading.

Since the 2 May I seem to have lost about 20 WU's not good at the most I might have six on my system this includes the 'ready to run' and the 'ready to report' units.

They will no doubt time out and be deleted.

As I run about 5 projects this is not normal I have 3 units for seti since last July and 1 unit for CPDN since october last year.

This might prove useful.
I am no Linux guru by any stretch of the imagination, but I did notice that when the concurrent user count approachs 100 the problems started.
Linux distros by default leave the max connections at 100.
I would expect the same sort of settings for My_SQL
I have made several assumptions here.
So I am sorry if this seems a little basic.

<img border="0" src="http://boinc.mundayweb.com/one/stats.php?userID=2104" />
ID: 7544 · Report as offensive     Reply Quote
STE\/E

Send message
Joined: 2 Sep 04
Posts: 352
Credit: 1,393,150
RAC: 0
Message 7545 - Posted: 9 May 2005, 9:57:24 UTC
Last modified: 9 May 2005, 10:06:24 UTC

I wouldn't worry about it Ian, if you want to Post something then go ahead & post it. Nobody's expecting the Admins to reply to each and every post so I don't see where it creates a problem at all...

Thats what the other Forum Members are here for to try & help you out or reply if they feel like it. The LHC Forum is kinda boring anyway in as much as not much posting goes on, sometimes 2 or 3 days go by without any posts in the Numbers Forum so it's always good to see something new being posted ...

If you go by the advice some are suggesting here then I take it we are all just supposed to Crunch the WU's and keep our yaps shut ... ppppffffttttt ... All work and no play in the Forum would make the Project quite boringly repetitive IMO ... ;)

ID: 7545 · Report as offensive     Reply Quote
Profile Schatzalp Davos Switzerland

Send message
Joined: 18 Sep 04
Posts: 5
Credit: 1,030,795
RAC: 0
Message 7546 - Posted: 9 May 2005, 15:33:54 UTC - in response to Message 7545.  
Last modified: 9 May 2005, 15:35:31 UTC

> If you go by the advice some are suggesting here then I take it we are all
> just supposed to Crunch the WU's and keep our yaps shut ...

To PoorBoy: Couldn't agree more. I guess it takes 'some' longer then others to understand things sometimes.

<img src="http://www.boincsynergy.com/images/stats/comb-1321.jpg"></img>
ID: 7546 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : ADMIN! - Project Down Errors & Message Board Errors


©2024 CERN