Message boards :
Number crunching :
Can't parse scheduler reply
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Sep 04 Posts: 24 Credit: 12,288 RAC: 0 |
Not sure if this is a BOINC problem or LHC related : From stderr.txt : 2004-10-02 00:30:10 [LHC@home] SCHEDULER_REPLY::parse(): bad first tag Content-type: text/plain 2004-10-02 00:30:10 [LHC@home] Can't parse scheduler reply And from sched_reply.xml : Content-type: text/plain Server can't open database 3600 LHC@home Problem seems to be this Content-type: text/plain |
Send message Joined: 2 Sep 04 Posts: 121 Credit: 592,214 RAC: 0 |
Same here, I also get multiple errors from the Website (too many connections, unable to connect to Database). Seems they've hit their first real bottleneck on the Servers ;) ___________________________________________ <p>Scientific Network : 36200 MHz «» 8204 MB «» 815.0 GB </p> |
Send message Joined: 2 Sep 04 Posts: 352 Credit: 1,748,908 RAC: 1,957 |
2004-10-02 00:30:10 [LHC@home] SCHEDULER_REPLY::parse(): bad first tag Content-type: text/plain 2004-10-02 00:30:10 [LHC@home] Can't parse scheduler reply And from sched_reply.xml : ========= Well at least now I know I'm not the only one getting that Message ... Been getting a lot of 2 & 3 Min tunescan WU's also, but just on one Computer, it could be the Computer I don't know, but it was running ok up until a few hours ago & then I had to do a reset & final uninstall BOINC & reinstall it to get it to run. Naturally I got a fresh load of WU's so maybe it's the WU's also...??? |
Send message Joined: 2 Sep 04 Posts: 16 Credit: 26,713 RAC: 0 |
I believe LHC better go back to beta. Seems it can't handle the bandwidth, ???? |
Send message Joined: 1 Sep 04 Posts: 39 Credit: 54,460 RAC: 0 |
Seems they underestimated how many wu's we can run and how much bandwidth we can gobble up. ;) |
Send message Joined: 2 Sep 04 Posts: 321 Credit: 10,607 RAC: 0 |
look this maybe helpfull http://www.google.de/search?hl=de&ie=UTF-8&q=Can%27t+parse+scheduler+reply&btnG=Google-Suche&meta= feel free to visit www.guidowaldenmeier.de |
Send message Joined: 2 Sep 04 Posts: 24 Credit: 12,288 RAC: 0 |
> Seems they underestimated how many wu's we can run and how much bandwidth we > can gobble up. ;) Well, many (or all?) 8-hours WUs (sixtrack 4.46) run only 3 minutes - I doubt this is the normal behaviour. edit: This is the complete output btw., it ate the XML tags. So the scheduler reply was absolutely correct and contained a valid error message, it just shouldn't have had the mime type in the header. From stderr.txt : 2004-10-02 00:30:10 [LHC@home] SCHEDULER_REPLY::parse(): bad first tag Content-type: text/plain 2004-10-02 00:30:10 [LHC@home] Can't parse scheduler reply 2004-10-02 00:30:10 [LHC@home] Deferring communication with project for 1 minutes and 0 seconds And from sched_reply.xml : Content-type: text/plain [scheduler_reply] [message priority="low"]Server can't open database[/message] [request_delay]3600[/request_delay] [project_is_down/] [/scheduler_reply] [scheduler_reply] [project_name]LHC@home[/project_name] [/scheduler_reply] |
Send message Joined: 28 Sep 04 Posts: 3 Credit: 8,995 RAC: 0 |
got the same problem: > 2004-10-02 00:30:10 [LHC@home] SCHEDULER_REPLY::parse(): bad first tag > Content-type: text/plain > 2004-10-02 00:30:10 [LHC@home] Can't parse scheduler reply I just re-installed Boinc by re-running Boinc update 4.13. Now Bionc says: > No work from project > Deferring communication with project for... But it did upload my finished results now. and didnt do that before i reinstalled. Maybe the re-instal did help. |
Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0 |
A definitive answer (at least, it's as definitive as anything is with BOINC) When the server is under heavy load it can exceed the number of connections allowed to the database resulting in the message 'Server can't open database'. There is a problem with the server software that returns a malformed XML message in this circumstance that the client can't understand. The result is a 'Cannot parse scheduler reply' message displayed by the client. What it means is 'Server is a bit busy right now'. Reinstalling the BOINC software won't help directly since this is a server problem. Waiting a few minutes (while you re-install BOINC, perhaps) will allow some connections to clear, and things might then work. The connection limit is configurable by the admins. Raising it can improve connection performance, but raising it too far can result in much reduced throuhput. Currently it's set at 100 connections unless Markku has changed it recently. It had been set at 400 connections prior, but this resulted in only 25% of the throughput! IMO the whole BOINC database set up needs a good overhaul since these performance problems pop up at what seems to be quite low levels. As always, I suspect that the database performance may only be improved by the expenditure of money - something which most BOINC projects are short of. I think that's it. Anyone got anything to add? Giskard - the first telepathic robot. |
Send message Joined: 27 Sep 04 Posts: 36 Credit: 29,315 RAC: 0 |
About the connections to the database: @Developers: Have you ever heard of connection pooling? That's the solution to problems like you are getting here. No offense intended, but that's what i've (as a web developer) seen always if a bottleneck like this occurs... Just a thought. greetz, Uli |
Send message Joined: 1 Sep 04 Posts: 506 Credit: 118,619 RAC: 0 |
All you need to know about this error is in this thread. Just wait and your work will be reported. Giskard - the first telepathic robot. |
Send message Joined: 3 Sep 04 Posts: 212 Credit: 4,545 RAC: 0 |
> About the connections to the database: > > @Developers: > Have you ever heard of connection pooling? That's the solution to problems > like you are getting here. No offense intended, but that's what i've (as a web > developer) seen always if a bottleneck like this occurs... Maybe, maybe not. What you actually mean with connection pooling? Markku Degerholm LHC@home Admin |
Send message Joined: 3 Sep 04 Posts: 212 Credit: 4,545 RAC: 0 |
MikeW got it right. System overload is actually pretty natural after longer periods of no work / no service, because most of the 7500 active hosts try to connect within one hour and download/upload dozens of workunits at once. And if connections start to fail, they will try to reconnect soon... And so there are yet more connections attemps. And then there are forums and other web things that also generate more load to the database. Markku Degerholm LHC@home Admin |
©2024 CERN