Message boards : Number crunching : Can't parse scheduler reply
Message board moderation

To post messages, you must log in.

AuthorMessage
joe

Send message
Joined: 2 Sep 04
Posts: 24
Credit: 12,288
RAC: 0
Message 3037 - Posted: 1 Oct 2004, 22:43:41 UTC

Not sure if this is a BOINC problem or LHC related :


From stderr.txt :
2004-10-02 00:30:10 [LHC@home] SCHEDULER_REPLY::parse(): bad first tag Content-type: text/plain
2004-10-02 00:30:10 [LHC@home] Can't parse scheduler reply


And from sched_reply.xml :
Content-type: text/plain


    Server can't open database
    3600
    


LHC@home



Problem seems to be this Content-type: text/plain
ID: 3037 · Report as offensive     Reply Quote
Profile FalconFly
Avatar

Send message
Joined: 2 Sep 04
Posts: 121
Credit: 592,214
RAC: 0
Message 3038 - Posted: 1 Oct 2004, 23:00:22 UTC - in response to Message 3037.  

Same here, I also get multiple errors from the Website (too many connections, unable to connect to Database).

Seems they've hit their first real bottleneck on the Servers ;)
___________________________________________
<p>Scientific Network : 36200 MHz �� 8204 MB �� 815.0 GB </p>
ID: 3038 · Report as offensive     Reply Quote
STE\/E

Send message
Joined: 2 Sep 04
Posts: 352
Credit: 1,393,150
RAC: 0
Message 3043 - Posted: 2 Oct 2004, 0:14:34 UTC

2004-10-02 00:30:10 [LHC@home] SCHEDULER_REPLY::parse(): bad first tag Content-type: text/plain
2004-10-02 00:30:10 [LHC@home] Can't parse scheduler reply
And from sched_reply.xml :
=========

Well at least now I know I'm not the only one getting that Message ...

Been getting a lot of 2 & 3 Min tunescan WU's also, but just on one Computer, it could be the Computer I don't know, but it was running ok up until a few hours ago & then I had to do a reset & final uninstall BOINC & reinstall it to get it to run.

Naturally I got a fresh load of WU's so maybe it's the WU's also...???
ID: 3043 · Report as offensive     Reply Quote
KingPin

Send message
Joined: 2 Sep 04
Posts: 16
Credit: 26,713
RAC: 0
Message 3045 - Posted: 2 Oct 2004, 1:46:36 UTC
Last modified: 2 Oct 2004, 1:47:17 UTC

I believe LHC better go back to beta. Seems it can't handle the bandwidth, ????

ID: 3045 · Report as offensive     Reply Quote
LP

Send message
Joined: 1 Sep 04
Posts: 39
Credit: 54,460
RAC: 0
Message 3048 - Posted: 2 Oct 2004, 1:56:29 UTC

Seems they underestimated how many wu's we can run and how much bandwidth we can gobble up. ;)
ID: 3048 · Report as offensive     Reply Quote
Guido Alexander Waldenmeier

Send message
Joined: 2 Sep 04
Posts: 321
Credit: 10,607
RAC: 0
Message 3059 - Posted: 2 Oct 2004, 6:16:09 UTC

look this maybe helpfull
http://www.google.de/search?hl=de&ie=UTF-8&q=Can%27t+parse+scheduler+reply&btnG=Google-Suche&meta=
feel free to visit www.guidowaldenmeier.de


ID: 3059 · Report as offensive     Reply Quote
joe

Send message
Joined: 2 Sep 04
Posts: 24
Credit: 12,288
RAC: 0
Message 3060 - Posted: 2 Oct 2004, 6:42:12 UTC - in response to Message 3048.  
Last modified: 2 Oct 2004, 7:16:00 UTC

> Seems they underestimated how many wu's we can run and how much bandwidth we
> can gobble up. ;)


Well, many (or all?) 8-hours WUs (sixtrack 4.46) run only 3 minutes - I doubt this is the normal behaviour.

edit:
This is the complete output btw., it ate the XML tags.
So the scheduler reply was absolutely correct and contained
a valid error message, it just shouldn't have had the
mime type in the header.


From stderr.txt :
2004-10-02 00:30:10 [LHC@home] SCHEDULER_REPLY::parse(): bad first tag Content-type: text/plain
2004-10-02 00:30:10 [LHC@home] Can't parse scheduler reply
2004-10-02 00:30:10 [LHC@home] Deferring communication with project for 1 minutes and 0 seconds


And from sched_reply.xml :
Content-type: text/plain

[scheduler_reply]
    [message priority="low"]Server can't open database[/message]
    [request_delay]3600[/request_delay]
    [project_is_down/]
[/scheduler_reply]
[scheduler_reply]
[project_name]LHC@home[/project_name]
[/scheduler_reply]
ID: 3060 · Report as offensive     Reply Quote
Bermon.net

Send message
Joined: 28 Sep 04
Posts: 3
Credit: 8,995
RAC: 0
Message 3907 - Posted: 16 Oct 2004, 17:23:42 UTC - in response to Message 3037.  

got the same problem:

> 2004-10-02 00:30:10 [LHC@home] SCHEDULER_REPLY::parse(): bad first tag
> Content-type: text/plain
> 2004-10-02 00:30:10 [LHC@home] Can't parse scheduler reply

I just re-installed Boinc by re-running Boinc update 4.13. Now Bionc says:

> No work from project
> Deferring communication with project for...

But it did upload my finished results now. and didnt do that before i reinstalled. Maybe the re-instal did help.
ID: 3907 · Report as offensive     Reply Quote
Gaspode the UnDressed

Send message
Joined: 1 Sep 04
Posts: 506
Credit: 118,619
RAC: 0
Message 3911 - Posted: 16 Oct 2004, 18:55:09 UTC
Last modified: 16 Oct 2004, 18:56:58 UTC

A definitive answer (at least, it's as definitive as anything is with BOINC)

When the server is under heavy load it can exceed the number of connections allowed to the database resulting in the message 'Server can't open database'. There is a problem with the server software that returns a malformed XML message in this circumstance that the client can't understand. The result is a 'Cannot parse scheduler reply' message displayed by the client. What it means is 'Server is a bit busy right now'. Reinstalling the BOINC software won't help directly since this is a server problem. Waiting a few minutes (while you re-install BOINC, perhaps) will allow some connections to clear, and things might then work.

The connection limit is configurable by the admins. Raising it can improve connection performance, but raising it too far can result in much reduced throuhput. Currently it's set at 100 connections unless Markku has changed it recently. It had been set at 400 connections prior, but this resulted in only 25% of the throughput!

IMO the whole BOINC database set up needs a good overhaul since these performance problems pop up at what seems to be quite low levels. As always, I suspect that the database performance may only be improved by the expenditure of money - something which most BOINC projects are short of.

I think that's it. Anyone got anything to add?


Giskard - the first telepathic robot.


ID: 3911 · Report as offensive     Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 27 Sep 04
Posts: 36
Credit: 29,315
RAC: 0
Message 3930 - Posted: 17 Oct 2004, 1:46:10 UTC

About the connections to the database:

@Developers:
Have you ever heard of connection pooling? That's the solution to problems like you are getting here. No offense intended, but that's what i've (as a web developer) seen always if a bottleneck like this occurs...

Just a thought.



greetz, Uli
ID: 3930 · Report as offensive     Reply Quote
Gaspode the UnDressed

Send message
Joined: 1 Sep 04
Posts: 506
Credit: 118,619
RAC: 0
Message 4857 - Posted: 4 Nov 2004, 11:36:45 UTC

All you need to know about this error is in this thread. Just wait and your work will be reported.


Giskard - the first telepathic robot.


ID: 4857 · Report as offensive     Reply Quote
Profile Markku Degerholm

Send message
Joined: 3 Sep 04
Posts: 212
Credit: 4,545
RAC: 0
Message 4870 - Posted: 4 Nov 2004, 13:25:38 UTC - in response to Message 3930.  

> About the connections to the database:
>
> @Developers:
> Have you ever heard of connection pooling? That's the solution to problems
> like you are getting here. No offense intended, but that's what i've (as a web
> developer) seen always if a bottleneck like this occurs...

Maybe, maybe not. What you actually mean with connection pooling?

Markku Degerholm
LHC@home Admin
ID: 4870 · Report as offensive     Reply Quote
Profile Markku Degerholm

Send message
Joined: 3 Sep 04
Posts: 212
Credit: 4,545
RAC: 0
Message 4872 - Posted: 4 Nov 2004, 13:33:03 UTC

MikeW got it right.

System overload is actually pretty natural after longer periods of no work / no service, because most of the 7500 active hosts try to connect within one hour and download/upload dozens of workunits at once. And if connections start to fail, they will try to reconnect soon... And so there are yet more connections attemps. And then there are forums and other web things that also generate more load to the database.




Markku Degerholm
LHC@home Admin
ID: 4872 · Report as offensive     Reply Quote

Message boards : Number crunching : Can't parse scheduler reply


©2024 CERN