Message boards :
Number crunching :
Multiple Client Instances
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,361,961 RAC: 132,068 |
Referring to a discussion about multiple client instances that started here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4297&postid=30603 During the setup of multiple client instances I noticed a pitfall that is not mentioned in any of the tutorials I read. If an additional instance contacts the project server for the first time, it's host parameters will be compared against the host records in the server's DB. If the server decides that the host already exist, it will merge that new instance into the existing instance and both run under the same host ID. This leads to severe problems, e.g. that WUs from instance 1 will be cancelled if instance 2 connects to the server and vice versa. Workaround To solve the problem above the server has to be forced to generate a new host ID for every new instance. This can be achieved by changing a major host parameter on the client before the first server contact. My suggestion is to set <ncpus> in the file cc_config.xml to an unused value and then contact the server as often as necessary until it returns a new host ID. This should then be crosschecked on the project website. Once this new host ID exist it can be used in the same way than a real host, e.g. own venue, own WUs. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,569,815 RAC: 10,128 |
HM, in case of Races I'm using up to 20 instances per machine but I never saw this behaviour. Maybe, you copied the BOINC-Data directory to the new instance ? Or it is simply a mistake of looking? All instances will have the same machine-name and this can be very irritrating. But they all should have a different ID Supporting BOINC, a great concept ! |
Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,361,961 RAC: 132,068 |
...you copied the BOINC-Data directory to the new instance ? Yes. And after I noticed the described behavior I tried several measures: - deleted the host ID from the new instance before I restarted it. - connected a previously detached (fresh) and restarted instance. In any case the ID was restored by the server reply. The first measure that worked was a temporary change of <ncpus>. ...All instances will have the same machine-name and this can be very irritrating. But they all should have a different ID Yes, now they have the same machine-name but individual IDs and individual venues to control which subproject they run. And they keep the new ID although <ncpus> is now the same on every instance. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,569,815 RAC: 10,128 |
|
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,491,652 RAC: 2,067 |
My suggestion is to set <ncpus> in the file cc_config.xml to an unused value and then contact the server as often as necessary until it returns a new host ID. For me setting <suppress_net_info>1</suppress_net_info> in cc_config was enough to get new hostid's for the new instances. |
Send message Joined: 15 Jun 08 Posts: 2411 Credit: 226,361,961 RAC: 132,068 |
I also tried it with a (nearly) clean copy of the original data directory. - LHC and all other projects were detached - I checked if there were remains in the client_state.xml -> none - I deleted all files/dirs with referenes to LHC that were not automatically deleted Anyway, thank you for scratching your head. At the end the setup was perfectly running through a few cycles before the server outage and is still patiently asking for new work. What could be done server side is to increase the number of venues to run more than 4 subprojets, e.g. sixtrack, in such a setup. May be the project admins find time to comment this request after the server problems are solved. |
©2024 CERN