Message boards : Number crunching : Multiple Client Instances
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,046,382
RAC: 136,764
Message 30897 - Posted: 20 Jun 2017, 10:10:26 UTC

Referring to a discussion about multiple client instances that started here:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4297&postid=30603


During the setup of multiple client instances I noticed a pitfall that is not mentioned in any of the tutorials I read.

If an additional instance contacts the project server for the first time, it's host parameters will be compared against the host records in the server's DB. If the server decides that the host already exist, it will merge that new instance into the existing instance and both run under the same host ID. This leads to severe problems, e.g. that WUs from instance 1 will be cancelled if instance 2 connects to the server and vice versa.


Workaround

To solve the problem above the server has to be forced to generate a new host ID for every new instance.
This can be achieved by changing a major host parameter on the client before the first server contact.
My suggestion is to set <ncpus> in the file cc_config.xml to an unused value and then contact the server as often as necessary until it returns a new host ID. This should then be crosschecked on the project website.
Once this new host ID exist it can be used in the same way than a real host, e.g. own venue, own WUs.
ID: 30897 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 30901 - Posted: 20 Jun 2017, 11:59:47 UTC - in response to Message 30897.  

HM, in case of Races I'm using up to 20 instances per machine but I never saw this behaviour.

Maybe, you copied the BOINC-Data directory to the new instance ?

Or it is simply a mistake of looking? All instances will have the same machine-name and this can be very irritrating. But they all should have a different ID


Supporting BOINC, a great concept !
ID: 30901 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,046,382
RAC: 136,764
Message 30904 - Posted: 20 Jun 2017, 12:44:59 UTC - in response to Message 30901.  

...you copied the BOINC-Data directory to the new instance ?

Yes.

And after I noticed the described behavior I tried several measures:
- deleted the host ID from the new instance before I restarted it.
- connected a previously detached (fresh) and restarted instance.

In any case the ID was restored by the server reply.

The first measure that worked was a temporary change of <ncpus>.


...All instances will have the same machine-name and this can be very irritrating. But they all should have a different ID

Yes, now they have the same machine-name but individual IDs and individual venues to control which subproject they run.
And they keep the new ID although <ncpus> is now the same on every instance.
ID: 30904 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 30905 - Posted: 20 Jun 2017, 12:51:06 UTC - in response to Message 30904.  

...you copied the BOINC-Data directory to the new instance ?

Yes.

Then, this is the reason for your described behaviour. Better not to use the DATA_Directory from existing instances ...


Supporting BOINC, a great concept !
ID: 30905 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,433,416
RAC: 3,056
Message 30906 - Posted: 20 Jun 2017, 13:07:35 UTC - in response to Message 30897.  

My suggestion is to set <ncpus> in the file cc_config.xml to an unused value and then contact the server as often as necessary until it returns a new host ID.

For me setting
<suppress_net_info>1</suppress_net_info>
in cc_config was enough to get new hostid's for the new instances.
ID: 30906 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,046,382
RAC: 136,764
Message 30907 - Posted: 20 Jun 2017, 13:20:36 UTC

I also tried it with a (nearly) clean copy of the original data directory.
- LHC and all other projects were detached
- I checked if there were remains in the client_state.xml -> none
- I deleted all files/dirs with referenes to LHC that were not automatically deleted


Anyway, thank you for scratching your head.
At the end the setup was perfectly running through a few cycles before the server outage and is still patiently asking for new work.

What could be done server side is to increase the number of venues to run more than 4 subprojets, e.g. sixtrack, in such a setup.
May be the project admins find time to comment this request after the server problems are solved.
ID: 30907 · Report as offensive     Reply Quote

Message boards : Number crunching : Multiple Client Instances


©2024 CERN