1) Message boards : Number crunching : Initial Replication (Message 17646)
Posted 31 Jul 2007 by John McLeod VII
Post:
Einstein can afford this kind of thing occasionally, LHC can't.


Thanks for the opinion but I'll wait for LHC to confirm that statement before I believe it.

LHC has people waiting on the results to get physical work done (aligning the magnets). Einstein does not. Since LHC has real world deadlines, LHC has a problem with maximum turn around time for the WUs, therefore a higher initial replication is required. Einstein and S@H on the other hand are trying to get through a huge pile of work, and that leads to an initial replication to match the minimum quorum.

This is documented in earlier threads if you care to go looking (try a couple of years ago).
2) Message boards : Number crunching : Please make me the Founder (Message 17316)
Posted 9 Jul 2007 by John McLeod VII
Post:
A function to allow automatic team transfer has been added to the latest version of the BOINC source.
3) Message boards : Number crunching : XML Stats Export (Message 17096)
Posted 26 Jun 2007 by John McLeod VII
Post:
'larry1186' sounds to me as if you're a common centipede aka Lithobius Forficatus. Which only has up to 15 pair of feet...

In which case, there are more projects than you have feet.
4) Message boards : Number crunching : Merge?? (Message 17045)
Posted 16 Jun 2007 by John McLeod VII
Post:
Thanks Jim, but I pretty much stay on top up of keeping my managers updated. That particular box hasen't even done a wu yet and it's listed.....I think, if I remember right, 187 times! Would it work if I detached that box and tryed to reattach it? And do you know if I could take care of this through bam or is bam the problem?
OK, the only way I can do it is to detatch that box 187 times! I'll just wait for someone to fix the problem!



You may be a victim of the ghost host problem. I know it has been discussed in other threads. I'm not familiar with the problem other than to know that it exists. Sorry I can't be of more help.

Jim

The solution is to merge all of the hosts on the web site. Once you have one copy of it left you need to edit the client_state.xml file on that host and change the hostID to match the one remaining on the web. There are more detailed instructions here somewhere.

The problem is that they don't STAY fixed. If anything causes the host id to split again, it goes off into never never land again.

I also had a host with no credits yet and 200+ ghost-hosts. Merging them took some trouble, 'select all' didn't work but selecting say 30 at a time and then merging them went OK. And do remember to merge the old ones with the latest one, not the other way around (I found out the hard way)...
The host is now happily in the DB all by itself, just checked and still the ghost-hosts are gone. The host was effectively retired by the way (hostname changed after owner had W*dows reinstalled), I'm gonna delete it soon.

I just noted an interesting phenomon. I seem to have stopped getting ghost clients about the end of May. I believe that there may be a bug fix in the most recent version of the client.

BTW, you can merge something over 100 at a time.
5) Message boards : Number crunching : Merge?? (Message 17029)
Posted 14 Jun 2007 by John McLeod VII
Post:
Thanks Jim, but I pretty much stay on top up of keeping my managers updated. That particular box hasen't even done a wu yet and it's listed.....I think, if I remember right, 187 times! Would it work if I detached that box and tryed to reattach it? And do you know if I could take care of this through bam or is bam the problem?
OK, the only way I can do it is to detatch that box 187 times! I'll just wait for someone to fix the problem!



You may be a victim of the ghost host problem. I know it has been discussed in other threads. I'm not familiar with the problem other than to know that it exists. Sorry I can't be of more help.

Jim

The solution is to merge all of the hosts on the web site. Once you have one copy of it left you need to edit the client_state.xml file on that host and change the hostID to match the one remaining on the web. There are more detailed instructions here somewhere.

The problem is that they don't STAY fixed. If anything causes the host id to split again, it goes off into never never land again.
6) Message boards : Number crunching : New Hosts are Multiplying ! (Message 16711)
Posted 14 Apr 2007 by John McLeod VII
Post:
There is a fix
49 copies of host #2 in 24 hours ... merged them all, but there has to be a fix for this somewhere!!


If they only STAYED fixed.
7) Message boards : Number crunching : Trojan used by dishonest BOINC cruncher (Message 16625)
Posted 26 Mar 2007 by John McLeod VII
Post:
BTW, from what I understand, and this is not well substantiated, it is over 20 people that have been banned. I believe many more have quit processing there because of this.
8) Message boards : Number crunching : Because you asked.... (Message 16567)
Posted 17 Mar 2007 by John McLeod VII
Post:
[...]prune out the millions of unwanted hosts from the db before the real migrate, perhaps only migrating those hostids that have ever submitted work.


I like that idea. For those hosts that never submitted work but still want to participate, they can manually add "themselves" back to the project post-migration.



I joined before Christmas, leave my laptop running and ready for any LHC work but haven't received my first WU yet. System regulary checks now, while I run other BOINC projects.
I'd hate to be deleted after patiently waiting and have to wait for some sort of post-migration period to be over before I can re-apply. Maybe there needs to be a list of ID's that are still active because within the last x months either they submitted WU's or they have been polling for WU's.


Just to clarify:

There are a huge number of ghost hosts in the system, due to a bug that makes / made some hosts forget their identity. The proposal is aimed at the most straightforward way of pruning them out of the database to make it more manageable.

The deletion of your host, if you have not got any work by then would simply meant that next time your tired for work, the host would be re-initialised. This would happen on the very next connection.

The date shown for that computer joining the project would be the date it was given the new identity, but you would still have your original joining date shown on the forum boards nect to every post.

It is not a perfect solution, but in my opinion offers the most practical balance between taking up the admins time and bringing the database back to something manageable.

R~~

I have a slightly different solution that may generate a smaller table. Force a merge between all hosts that are normally allowed to merge that have the same name. This will get rid of more hosts as some of those host will still have WUs attached. Then auto delete any hosts that can be deleted that have not reported in the last 30 days or so that have no WUs attached. This should generate a fairly small table of hosts.
9) Message boards : Number crunching : /stats/ empty (Message 15968)
Posted 2 Jan 2007 by John McLeod VII
Post:
Any idea on what the URL will change to once the project has moved to Queen Mary?
10) Message boards : Number crunching : Please make me the Founder (Message 15745)
Posted 30 Nov 2006 by John McLeod VII
Post:
... BOINC now has a feature for members of teams with inactive founders to request the foundership and have it transfered to them without admin intervention. However LHC will have to update their php code and their database tables ...

That was a recent change, and LHC is missing modifications from months ago - maybe even a year. Lots of things changed, wouldn't be easy to upgrade, I doubt they will do it soon.


Yes, the issue is that Chrulle tweaked the server code to provide variable deadlines, so that urgent work could be run ahead of routine work.

The project do not want to discard that new code of Chrulle's, and that means it is not as simple as installing the udgraded server from the install files.

How to avoid this in the future: make modifications on the code downloaded from CVS. Running cvs update on that directory later will merge server changes with local changes when required.

EDIT: waaait a minute... variable deadlines? Why does that need server tweaks? Different deadlines can be set to *each workunit* since I started looking at server stuff...

I believe that work that was never returned gets a shorter deadline when re-issued.
11) Message boards : Number crunching : New computer database entry created on each connect (Message 15424)
Posted 12 Nov 2006 by John McLeod VII
Post:

1. Stop BOINC.


Quick Tip:

To actually stop BOINC (as opposed to simply closing the BOINC Manager), go to Start -> Run. In the box type "net stop boinc" (without the quotes) then click ok. A box will appear and disappear.

To Start BOINC again, go to Start -> Run. In the box type "net start boinc" (without the quotes) then click ok. A box will appear and disappear.

Stop BOINC: Start -> Run; "net stop boinc" [ok]
Start BOINC: Start -> Run; "net start boinc" [ok]

This only works if you have installed as a service. Possibly only on Windows.
12) Message boards : Number crunching : O.K., I'm New! No Work?? (Message 14405)
Posted 23 Jul 2006 by John McLeod VII
Post:
Thanks very much for your reply mmciastro. I am actually attached to four projects. Any advice? <edit> Perhaps I have two boinc accounts??


Thank you Ziran for your suggestion. I will try that also.
Peter

The single unique thing that is necessary for project coordination is your email address. I suspect your email address for LHC is different than it is for the others. You can check this by going to "your account" on all projects. I suspect you have two LHC accounts (two different email addys), but have only attached under one of them. If you do have two accounts, there is no way to merge them, so you'll probably have to abandon the "peter gerling" account at LHC, Detach from LHC, then reattach using the "other" email addy (the one that matches the other projects). I know this isn't great news, but there it is.

tony


Hey Tony,
I took your advice and detached from LHC and then reattached to LHC. Got 5 WU's yesterday and now my stats show all four projects.
Thanks very much!
Peter



You cannot merge two accounts in the same project, but the separate projects can be successfully "merged" by setting the email addresses the same.
13) Message boards : Number crunching : I think we should restrict work units (Message 14404)
Posted 23 Jul 2006 by John McLeod VII
Post:
.... What happens when two particles traveling in opposite directions at nearly the speed of light hit each other? A collision at nearly twice the speed of light!
....
.

hmmm....

Sort of like setting your
" Connect to network about every..."
to 20 (twice the max allowed...)


....
....
I don't know what this "Cache" thing is....
....
.


It seems so!

Take a look at your general preferences for:
Connect to network about every
(determines size of work cache; maximum 10 days)

Bigger numbers allow you to get more work.

But, you should set your "Connect to network about every"
to no more than one-tenth.
That is the "fair" thing to do...




My machines that are always connect (usually) have a cache of 0.1 days. My machines that can only connect once a day have a cache setting of 1.0 days. If it were any less, they would not be able to crunch all of the time (they are attached to enough projects so that I don't really care which project they crunch for at any given time). The reason that I say usually 0.1 days is that I work on the CPU scheduler and work fetch algorithms - sometimes I need to test some boundary conditions (and I have not yet gotten an LHC result when the cache was set high on the dev machines).
14) Message boards : Number crunching : Why can't I get any work? (Message 14403)
Posted 23 Jul 2006 by John McLeod VII
Post:
Whether there will be in the near future enough of work?

The very nature of this project is to have work sometimes and not at other times.

Out of my 11 machines, I believe that I got about 4 results in the last few days (all returned).
15) Message boards : Number crunching : Please make me the Founder (Message 14402)
Posted 23 Jul 2006 by John McLeod VII
Post:
To the SysAdmin/Moderator or just a person with the nessecary power.
The team that I have joined has no founder as he passed away on the 16/8/05. I would like to take the team over with the hope of increasing it when I get the Web page thing sorted out. I have asked this of all the projects in the team as I participate in all of them. I am the only other member of the team.
So I am asking to be made the team Founder.
Team name is TeamAUS
Original founder is Craig Zuvich
My name is Conan
Country is Australia.

Can you please make the Founder of TeamAUS.

Thank you for your time.



Attention LHC@home,
Can my request to be the team Founder be followed up please as it has now been 15 days since I posted (see message id 14264). I would like to take this team somewhere but need to be the founder to do it.
Thank you in advance.

I believe that this may take a while, as I believe that they do not currently have a BOINC sysadmin.
16) Message boards : Number crunching : Not HAPPY people. (Message 13383)
Posted 15 Apr 2006 by John McLeod VII
Post:
Did JM7 get the signs wrong? Well no, becoause the sign depends whether you read the table as being of debt owed *to* the project, or debt owed *by* the project.

I believe that the signs are reversed for the calculations. However, in my defense, Short Term Debt was in place before I started, and the signs for LTD match those for STD (imagine trying to explain that dichotomy). In actuallity, it is a CPU balance instead of a CPU debt calculation. However, I am not about to try to fix it.
17) Message boards : Number crunching : I think we should restrict work units (Message 13382)
Posted 15 Apr 2006 by John McLeod VII
Post:
My cache is set to MAX on all 3 machines for LHC because it usually don't have work and it takes less than 2 days for it to run out and 1 machine is usually not connected (alienware laptop), my old dell is on crappy internet (semibraodband cellphone modem competing with file sharing), and the other one is at work so its got a good connection. LHC is at 15% for me but its struggling to reach 10% of my total credit because it don't always have work. Oh yeah one trick to force your boinc to dowload the most possible LHC workunits is to suspend all other projects, update on LHC, then resume all the other projects. Sometimes on a one or two day window when LHC has work, boinc dont want to download any. THe suspend-download-resume trick works really well and LHC's short deadlines means they get crunched first, and I don't care if I can't complete all the QMC work on time.

On all other projects except the CPDN ones only the alienware laptop is set to MAX. I also shut down (suspended) QMC because they lie to BOINC about crunch times, making your boinc crunch nearly 2 weeks straight on NDF mode even on a non-maxCache machine (their workunits take 40 hours not 15 or 20). I might turn QMC back on in a few weeks but if they send 8 40-hour work units again i probably will abort most of them and only crunch 2 or 3 of them.

Ufuids is having problems and is only serving a masters thesis. Their database crash made me lose out on credit for 12 WUs - 5+2+5 for each machine - and they take 15 hours each to crunch.


You only get one cache setting per venue. You do not get a separate cache setting per project - The General Settings are for all projects.

With the more recent versions of BOINC the problem with QMC would also resolve itself after the first batch of work.
18) Message boards : Number crunching : can't download (Message 12535)
Posted 28 Jan 2006 by John McLeod VII
Post:
Yes, still runs fine otherwise. And it does not seem to be everytime the benchmarks run. So, I am still looking to find the exact sequence. But, at the moment, I hesitate to risk more than one system as I am not much into thrill rides... :)

Since we don't seem to be making any progress to getting the code checked in (or at least it was not checked in last I looked this afternoon ...), I don't suppose there is any hurry about anything.

If you have the time, maybe a compile without the debug code since it seems stable otherwise...

I think they also have a problem in the attach wizard as I now have a hammered account. What is worse it is with PPAH and they are about the most unresponsive project going. :(

Well, what is 171,000 cobblestones ...

The code has been checked in.

However, it includes a major bug submitted by someone else as a modification to the LTD calculations. Based on the code checked in, I generated a pair of scanarios where a work fetch from one project and a work complete from a second (requires a third project to make it interesting) were done in a different order, and the LTDs were thousands of seconds different.
19) Message boards : Number crunching : Download issues (Message 12524)
Posted 27 Jan 2006 by John McLeod VII
Post:
Hi all. Wondered if I could have your two cents... :D

Ok I've got an X2 4400+ on LHC.

It d/l WU's, processes, sends back.
ATM it wont accept any WU's.
I check the computer on my LHC profile, and there are loads of WU's that LHC says that it sent to the computer. But my computer hasnt got them.
As a result I think LHC wont send any more.

Anytime I send a request I get:
'not sending or accepting any new work'

I tried increasing my work cache - no effect.

Do I have to reset the project, or is this a DB issue?

BTW I only have 5 primegrid WU's left (5hrs computing time) so I have an 'empty cache' as it were.

BOINC will get into this situation if any of the deadlines are a bit tight. Won't download anything from anywhere. If it is just this project and you are attached to other projects, it is entirely possible that LHC needed some extra CPU time recently, and was thus barred from getting new work until it had paid back the time. My guess is that since the detach / attach worked, it was that LHC had used some extra CPU time and was barred from downloading work in order to meet the resource shares that you specified.
20) Message boards : Number crunching : can't download (Message 12521)
Posted 27 Jan 2006 by John McLeod VII
Post:
Yes, so far so good. THough I still have it only running on one 4 CPU system. The code that John did seems to be fine. Though I am having some other issues:

1) Attach wizard may have corrupted my Predictor@Home account to the point where my e-mail address does not exist and there is no way to recover the account ... 171,000 cobblestones down the tubes ... I have sent a note to PPAH, but, without an account I cannot post on the boards, and they have not acknowledged the post ... I did post a note about this to the dev forum.

2) There is a crash that seems to be consistent after benchmarks are run. Has happened on two different systems so far. I am waiting to see if it happens again on the Xeon ... and the benchmark numbers are wrong. Of course this is a compile with debug turned on so that may not be unexpected. Well, it did not crash on a forced run of the benchmarks ... but it sure does hammer the numbers! I can fix that though, I saved an old client state file and copied back the old numbers ...


Is it only after the benchmarks run? Will it work if the benchmarks do not run?


Next 20


©2024 CERN