Message boards : Number crunching : Host corruption
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
Gaspode the UnDressed

Send message
Joined: 1 Sep 04
Posts: 506
Credit: 118,619
RAC: 0
Message 10923 - Posted: 25 Oct 2005, 16:02:54 UTC
Last modified: 25 Oct 2005, 16:05:24 UTC

Another corrupted host:
LV-01, host 30627 now thinks it is at venue 40015757312!
Last connected at 15:23 UTC. Running BOINC 5.2.2. It also thinks it has exceeded its quota of 100 results per day. although it's only had about 2


Gaspode the UnDressed
http://www.littlevale.co.uk
ID: 10923 · Report as offensive     Reply Quote
TPR_Mojo

Send message
Joined: 1 Sep 04
Posts: 8
Credit: 349,947
RAC: 0
Message 10924 - Posted: 25 Oct 2005, 16:13:24 UTC
Last modified: 25 Oct 2005, 16:20:06 UTC

Corrupted hosts on my account

37059
57498
34303
34351
57463
65799
57431
36703
24898
57462
37063
36752

All Windows machines were running BOINC 4.45, Linux 4.45 or 4.72.

ALL machines were upgraded to 5.2.2 (Windows) or 5.2.4 (Linux) yesterday.

I do have some hosts which are not showing corruption. I will see if they reset over the next 24 hours.

*EDIT* Detach/reattach worked for some hosts, have nice new host records which are correct. Can't merge them with the "old" host records though as the corruption affects the CPU and Operating System fields.
ID: 10924 · Report as offensive     Reply Quote
Profile Mr.Pernod
Avatar

Send message
Joined: 16 Jul 05
Posts: 65
Credit: 369,728
RAC: 0
Message 10925 - Posted: 25 Oct 2005, 16:17:27 UTC - in response to Message 10915.  
Last modified: 25 Oct 2005, 16:34:59 UTC

somehow host 57189 is corrupted again.
between the time of my previous post and this one the following happened:
- the host connected and downloaded new results.
- credit was granted for a previous wu. ( http://lhcathome.cern.ch/workunit.php?wuid=733550 )
hope this narrows down the search.

host 57189 contacted the server again to upload a finished result and download a new result, the host information is restored to normal again except for
Average download rate: Unknown
Average turnaround time: 0 days

waiting for another result to get credit.

host 47322
what happened:
- initial state: corrupt info
- report results: correct info
- credit granted: corrupt info
- manual update: correct info*

*except for the following fields:
Average download rate Unknown
Average turnaround time 0 days
Maximum daily WU quota per CPU 1/day


this leads me to believe the communication between the server and the clients is not where the problems lies in this case.

[edit]
I have been able to duplicate this with host 57272 and 47319
[/edit]
ID: 10925 · Report as offensive     Reply Quote
timethief

Send message
Joined: 22 Jul 05
Posts: 7
Credit: 67,923
RAC: 0
Message 10926 - Posted: 25 Oct 2005, 16:24:53 UTC

And another one: host 42203: The quota droped to 2/WU per day ...
ID: 10926 · Report as offensive     Reply Quote
Profile Thierry Van Driessche
Avatar

Send message
Joined: 1 Sep 04
Posts: 157
Credit: 82,604
RAC: 0
Message 10927 - Posted: 25 Oct 2005, 16:27:52 UTC
Last modified: 25 Oct 2005, 16:32:14 UTC

I have one host with ID 21726, last contact as from stdoutdae.txt was

2005-10-24 15:56:11 [LHC@home] Scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded

Last contact based on website is as from 24 Oct 2005 13:56:06 UTC

Contact before the first time that project was down was
2005-10-24 11:46:00 [LHC@home] Scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded

Boinc version 5.2.2 but boinc.exe replaced by truxsoft's version 5.3.1

I suspended LHC@Home until now. Will try an update and see what it gives.
ID: 10927 · Report as offensive     Reply Quote
Michael Karlinsky
Avatar

Send message
Joined: 18 Sep 04
Posts: 163
Credit: 1,682,370
RAC: 0
Message 10928 - Posted: 25 Oct 2005, 16:40:52 UTC

Mine too.

hostid: 24068 last contacted: 25 Oct 2005 4:47:37 UTC OS: Linux 2.4.19
hostid: 40264 last contacted: 25 Oct 2005 4:44:58 UTC OS: Linux 2.6.11

Both run BOINC 4.19

The 3rd host 21204 is OK. I was wondering why UL/DL was possible, although
home page showed "shutdown for maintenance"....

HTH

Michael
Team Linux Users Everywhere
ID: 10928 · Report as offensive     Reply Quote
Profile Thierry Van Driessche
Avatar

Send message
Joined: 1 Sep 04
Posts: 157
Credit: 82,604
RAC: 0
Message 10929 - Posted: 25 Oct 2005, 16:46:07 UTC
Last modified: 25 Oct 2005, 16:58:42 UTC

After doing the "update" in Boinc Manager

2005-10-25 18:33:15 [LHC@home] New host venue: 721168
Oops, ??

Still the same host ID on the website, and with correct info.

After closing Boinc and starting over

2005-10-25 18:40:44 [LHC@home] Computer ID: 21726; location: 721168; project prefs: default

Oops, location: 721168 ??

Edit1
I did an update on LHC@Home preferences. The location was set as "Home", but choose once again the "Home" as default location . Update on Boinc Manager, no effect.

Edit2
Went to the details of my host, location was set as "---", choose "Home" as location, did an update in Boinc Manager. Result:
25/10/2005 18:53:01|LHC@home|Computer ID: 21726; location: home; project prefs: default
All is well now.
The earlier 'location: 721168' was thus coming from the fact there was previously no location defined for my host in the host details.
ID: 10929 · Report as offensive     Reply Quote
timethief

Send message
Joined: 22 Jul 05
Posts: 7
Credit: 67,923
RAC: 0
Message 10931 - Posted: 25 Oct 2005, 17:08:49 UTC - in response to Message 10921.  

Hey, all entries are back and look good! Thanks!

another corrupted host: 44537

Last contact at 02:11 UTC for returning one result. The result seems to be allright, only the system information of the host is a little damaged.

But... is it a problem? It seems to me that the data will be fixed with the next connection automatically, like it happend when you made a system update on your host.

A bit more nasty: the quota droped to 0 WU per day.


ID: 10931 · Report as offensive     Reply Quote
Michael Karlinsky
Avatar

Send message
Joined: 18 Sep 04
Posts: 163
Credit: 1,682,370
RAC: 0
Message 10935 - Posted: 25 Oct 2005, 18:14:47 UTC

Just want to confirm the info from this post.

After reporting some results from host 21204 (which was OK) host
info got corrupted too.

I did a manual update afterwards and host info was restored.


HTH

Michael
Team Linux Users Everywhere
ID: 10935 · Report as offensive     Reply Quote
TPR_Mojo

Send message
Joined: 1 Sep 04
Posts: 8
Credit: 349,947
RAC: 0
Message 10936 - Posted: 25 Oct 2005, 18:15:05 UTC
Last modified: 25 Oct 2005, 18:17:26 UTC

Host 68191 just returned a result (4068306) at 18:09 UTC from BOINC 5.2.4 (Linux) which corrupted the host record.

Result is fine, was nominated canonical result and granted credit.


ID: 10936 · Report as offensive     Reply Quote
Profile Mr.Pernod
Avatar

Send message
Joined: 16 Jul 05
Posts: 65
Credit: 369,728
RAC: 0
Message 10937 - Posted: 25 Oct 2005, 18:16:53 UTC - in response to Message 10936.  
Last modified: 25 Oct 2005, 18:18:05 UTC

Host 68191 just returned a result (4068306) at 18:09 UTC from BOINC 5.2.4 (Linux) which corrupted the host record.


let me guess, you also got credit for that result, right the second when you reported it?
ID: 10937 · Report as offensive     Reply Quote
TPR_Mojo

Send message
Joined: 1 Sep 04
Posts: 8
Credit: 349,947
RAC: 0
Message 10938 - Posted: 25 Oct 2005, 18:18:05 UTC - in response to Message 10937.  


let me guess, you also got credit for that result?


Yup, must edit quicker, my bad :)

ID: 10938 · Report as offensive     Reply Quote
tilad-x

Send message
Joined: 15 Jul 05
Posts: 17
Credit: 16,521
RAC: 0
Message 10939 - Posted: 25 Oct 2005, 18:19:28 UTC
Last modified: 25 Oct 2005, 18:20:36 UTC

Both of my hosts have become somewhat corrupted.

Host name: REFLECTION
Host ID 43015 - last contact 14OCT2005 17:30:20 UTC
Host ID 67449 - last contact 25OCT2005 02:06:01 UTC

The other info seems correct

Host name: TWONEYVILLE
Host ID 39924 - last contact 14OCT2005 17:14:14 UTC
Host ID 67437 - last contact 25OCT2005 02:56:04 UTC

The info for 67437 is really messed up.

Both hosts are running BOINC v. 4.45 I was about to install 5.2.2 on one of the machines, but I think I will hold off until this gets sorted out.
ID: 10939 · Report as offensive     Reply Quote
Profile Mr.Pernod
Avatar

Send message
Joined: 16 Jul 05
Posts: 65
Credit: 369,728
RAC: 0
Message 10941 - Posted: 25 Oct 2005, 18:20:29 UTC - in response to Message 10938.  


let me guess, you also got credit for that result?


Yup, must edit quicker, my bad :)

this seems to confirm my suspicion that the problem isn't in host-server-communication but somewhere between the validator and the database (see some of my earlier posts)
I already send a mail with what I have found to Chrulle.

Results are still being validated and credit is given to the correct hosts, so I guess we'll just have to wait and see what happens next.
ID: 10941 · Report as offensive     Reply Quote
TPR_Mojo

Send message
Joined: 1 Sep 04
Posts: 8
Credit: 349,947
RAC: 0
Message 10942 - Posted: 25 Oct 2005, 18:23:04 UTC - in response to Message 10941.  


this seems to confirm my suspicion that the problem isn't in host-server-communication but somewhere between the validator and the database (see some of my earlier posts)
I already send a mail with what I have found to Chrulle.

Results are still being validated and credit is given to the correct hosts, so I guess we'll just have to wait and see what happens next.


Good job, I'm not too fussed really, only provided my dodgy hosts because Chrulle asked for them. Work is still being done, BOINC isn't crashing, so a wee bit of patience - hopefully they have enough information now to track the problem down.
ID: 10942 · Report as offensive     Reply Quote
itenginerd
Avatar

Send message
Joined: 29 Aug 05
Posts: 42
Credit: 27,102
RAC: 0
Message 10943 - Posted: 25 Oct 2005, 18:23:34 UTC

On my initial look at my hosts, nothing was awry. I manually updated to report and pull work, and that host went corrupt (I had apparently used my quota of 200 WU/day--I don't get close to that!). Another manual update brought things back to normal, and I was able to download work again.

A couple minutes later, I suspended Seti@Home, and LHC downloaded another few WUs. No corruption there. That sounds to me like the problem is only in the report uploading portion of the cycle. Or else it has to do with the manual update.

I've set NNW for now to see if I can't force BOINC to do nothing but upload results. We'll see what happens. I run CC 4.72.

(j)
James
ID: 10943 · Report as offensive     Reply Quote
tilad-x

Send message
Joined: 15 Jul 05
Posts: 17
Credit: 16,521
RAC: 0
Message 10944 - Posted: 25 Oct 2005, 18:25:26 UTC - in response to Message 10939.  

Host name: TWONEYVILLE

10/24/2005 10:55:57 PM|LHC@home|Sending scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi
10/24/2005 10:55:57 PM|LHC@home|Requesting 0 seconds of work, returning 1 results
10/24/2005 10:55:58 PM|LHC@home|Scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
10/24/2005 10:55:58 PM|LHC@home|New host venue: 0

I like the new host venue of 0. *sarcasm off*

I'll get the info off my other host and post it here.
ID: 10944 · Report as offensive     Reply Quote
Profile Thierry Van Driessche
Avatar

Send message
Joined: 1 Sep 04
Posts: 157
Credit: 82,604
RAC: 0
Message 10945 - Posted: 25 Oct 2005, 18:25:47 UTC - in response to Message 10929.  
Last modified: 25 Oct 2005, 18:29:41 UTC

After doing the "update" in Boinc Manager

2005-10-25 18:33:15 [LHC@home] New host venue: 721168
Oops, ??

Still the same host ID on the website, and with correct info.

After closing Boinc and starting over

2005-10-25 18:40:44 [LHC@home] Computer ID: 21726; location: 721168; project prefs: default

Oops, location: 721168 ??

Edit1
I did an update on LHC@Home preferences. The location was set as "Home", but choose once again the "Home" as default location . Update on Boinc Manager, no effect.

Edit2
Went to the details of my host, location was set as "---", choose "Home" as location, did an update in Boinc Manager. Result:
25/10/2005 18:53:01|LHC@home|Computer ID: 21726; location: home; project prefs: default
All is well now.
The earlier 'location: 721168' was thus coming from the fact there was previously no location defined for my host in the host details.


After reporting the first result, info went corrupt again :-(
ID: 10945 · Report as offensive     Reply Quote
Nothing But Idle Time

Send message
Joined: 30 Sep 05
Posts: 6
Credit: 9,260
RAC: 0
Message 10946 - Posted: 25 Oct 2005, 18:25:50 UTC - in response to Message 10929.  

After doing the "update" in Boinc Manager

2005-10-25 18:33:15 [LHC@home] New host venue: 721168
Oops, ??

Still the same host ID on the website, and with correct info.

After closing Boinc and starting over

2005-10-25 18:40:44 [LHC@home] Computer ID: 21726; location: 721168; project prefs: default

Oops, location: 721168 ??

Edit1
I did an update on LHC@Home preferences. The location was set as "Home", but choose once again the "Home" as default location . Update on Boinc Manager, no effect.

Edit2
Went to the details of my host, location was set as "---", choose "Home" as location, did an update in Boinc Manager. Result:
25/10/2005 18:53:01|LHC@home|Computer ID: 21726; location: home; project prefs: default
All is well now.
The earlier 'location: 721168' was thus coming from the fact there was previously no location defined for my host in the host details.


@Thierry: I had a similar experience except in the sequence of events: I downloaded a WU and got msg that new venue was 7465.77. Being aware of this thread I went to the detail host info and saw that a previously OK set of data was now corrupted. I changed the venue to home and did boinc update which reset the venue. When the WU completed and reported, the host details were again reset to gibberish and the venue was back to ---. I reset it again and am waiting to see if the next report due will corrupt it again. I'm using Boinc 4.45.
ID: 10946 · Report as offensive     Reply Quote
Profile Thierry Van Driessche
Avatar

Send message
Joined: 1 Sep 04
Posts: 157
Credit: 82,604
RAC: 0
Message 10947 - Posted: 25 Oct 2005, 18:28:44 UTC - in response to Message 10945.  
Last modified: 25 Oct 2005, 18:29:27 UTC

After reporting the first result, info went corrupt again :-(

And after using the "update" button, info came back to normal.
ID: 10947 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : Host corruption


©2024 CERN