Thread 'Imbalance between Subprojects'

Author	Message
computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2745 Credit: 302,486,450 RAC: 75,419	Message 30542 - Posted: 29 May 2017, 8:50:19 UTC ly noticed a significant imbalance between the #of calculated WUs for each subproject on my hosts. [pre] # % Theory 40 45 ATLAS 26 29 LHCb 16 18 CMS 7 8 Total 89 100[/pre] It seems that the server's scheduler does not send the WUs in random order. Any comments from the project people? ID: 30542 · Reply Quote

nikogianna Send message Joined: 30 Jan 17 Posts: 7 Credit: 132,213 RAC: 0	Message 30544 - Posted: 29 May 2017, 13:31:18 UTC - in response to Message 30542. We are going to start investigating. This is not the intended behavior. ID: 30544 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2745 Credit: 302,486,450 RAC: 75,419	Message 30547 - Posted: 29 May 2017, 16:13:40 UTC - in response to Message 30544. We are going to start investigating. This is not the intended behavior. Thank you. ID: 30547 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 939 Credit: 781,711,560 RAC: 76,983	Message 30553 - Posted: 29 May 2017, 20:47:41 UTC Last modified: 29 May 2017, 20:53:04 UTC Mine is quite different. # % Theory 121 51 ATLAS 7 3 LHCb 90 38 CMS 21 9 This is queued and running. ID: 30553 · Reply Quote

PHILIPPE Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0	Message 30563 - Posted: 30 May 2017, 17:04:59 UTC - in response to Message 30553. Thinking over this problem , what is the best option to balance wus for sub-projects ? There are different ways to manage it : 1Â°) by number of tasks validated for each sub-project 2Â°) by time elapsed for each sub-project 3Â°) by credits earned for each sub-project 4Â°) by number of wus available for each sub-project if you choose 1Â°), if the wus have different duration ,it can distort the effect intended. it would give advantage for the sub-project which has the longer wu in duration if you choose 2Â°), if the wus have different duration , it can distort the effect intended. it wouls give advantage for the sub-project which has the shorter wu in duration if you choose 3Â°), if the wus get credit differently , it can distort the effect intended. it would give advantage for the sub-project which grants at the lower level the wu validated. if you choose 4Â°), if the wus offered by server are missing during a period , it can distort the effect intended. it would give advantage for the sub-project which has the best reliability in sending jobs (thought to week-end vmagent failure). Finally , it's not so easy to satisfy everybody (sub-project and volunteers). Maybe , harmonization in duration of sub-project wus and credits granted should be improved to provide a better satisfaction ... ID: 30563 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 30564 - Posted: 30 May 2017, 19:33:40 UTC - in response to Message 30563. What is "balance", and why? It really depends on the science they want to do (and maybe their own political infighting). But it is not our problem insofar as I can see. Why should we want an equal number, however measured? Maybe some projects can wait for months while others are needed more urgently. Sub-atomic physics is not about social equality insofar as I know. ID: 30564 · Reply Quote

PHILIPPE Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0	Message 30565 - Posted: 30 May 2017, 20:53:33 UTC - in response to Message 30564. So in the way , you think , only a human arbitration , made by the manager of each sub-project can decide the part of calculation affected on their own sub-project ,during a defined period. If a volunteer wants to change this scientific willingness, he has to specify only the project he wants to run. To sum up , a volunteer with all sub-projects choosen let the real choice to the CERN team scientific.If the volunteer decides to rule as he wants; he can only remove some sub-project that he doesn't want to run anymore in a timely fashion. It's an other way of scheduling... Each part of the puzzle with a different responsability: Sientific with the direction to go and volunteers with the motor to advance. All is possible... But the solution choosen has to be accepted by most of us. Egality is a rather good concept , but Efficiency, too. We are close to a political decision , difficult to understand when we don't have all the elements in our hands. Whatever it happens ,everyone can give his opinion and his feedback. I know people is interested by this theme. ID: 30565 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 30566 - Posted: 30 May 2017, 21:24:28 UTC - in response to Message 30565. If you had read what I had written, you would know that I propose letting CERN handle it any way they want. But perhaps more cogently, I think that the crunchers on this project won't do better, but will just impose arbitrary criteria to suit their whims. It will just add an additional burden on the scheduling operation, however it is performed, and will have nothing to do with the physics at all. ID: 30566 · Reply Quote

Yeti Volunteer moderator Send message Joined: 2 Sep 04 Posts: 468 Credit: 224,921,321 RAC: 9,644	Message 30576 - Posted: 31 May 2017, 16:18:25 UTC - in response to Message 30544. We are going to start investigating. This is not the intended behavior. I'm looking forward what your investigations will bring into the light ... Supporting BOINC, a great concept ! ID: 30576 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 939 Credit: 781,711,560 RAC: 76,983	Message 30579 - Posted: 31 May 2017, 20:57:07 UTC I agree with Jim, just let the scientist decide what's best. ID: 30579 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 255,399 RAC: 48	Message 30582 - Posted: 1 Jun 2017, 12:44:33 UTC - in response to Message 30579. Last modified: 1 Jun 2017, 12:45:02 UTC There are a number of different perspectives that the scheduling algorithm needs to consider. One is how you would like your host to be used. Already there is the ability to select which applications you would like to run. The next step is how you would like those applications to share the host. This comes down to what rules are defined in a policy that the scheduler has to consider. The simplest rule is to select an application at random. Next we can define any number of rules based on available metrics. However, as we try to optimize, both in terms of sharing the resources and making many people happy, this increases complexity so we have to get the trade-off right. One improvement could be adding a weighted value for the applications so that some would be preferred over another. Whether the metric used is tasks, wall time, cpu time, or credit can be discussed. But this will add complexity. My preference would be the triage applications as 'run', 'run if no other tasks available' and 'don't run'. In terms of task selection or each category, it would be random. ID: 30582 · Reply Quote

Yeti Volunteer moderator Send message Joined: 2 Sep 04 Posts: 468 Credit: 224,921,321 RAC: 9,644	Message 30583 - Posted: 1 Jun 2017, 13:08:00 UTC - in response to Message 30582. Last modified: 1 Jun 2017, 13:08:33 UTC My preference would be the triage applications as 'run', 'run if no other tasks available' and 'don't run'. In terms of task selection or each category, it would be random. We discussed in another thread already, that we need a Project-Priority for the Sub-Projects. Example: Prio 1: Atlas Prio 1: Theory Prio 2: CMS Prio 3: LHCb don't run: Sixtrack, Alice, ... So, if the client requests work, it will get Atlas or Theory-Tasks as long as they are available. If they are not available, it will get CMS. If even CMS doesn't offer work, it gets LHCb. Supporting BOINC, a great concept ! ID: 30583 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2745 Credit: 302,486,450 RAC: 75,419	Message 30585 - Posted: 1 Jun 2017, 13:40:36 UTC - in response to Message 30582. Last modified: 1 Jun 2017, 13:42:53 UTC When I started this thread I didn't expect that it would grow to a political discussion. My goal was to point out a possible technical problem as I thought WUs from all selected subprojects would be distributed at random (in total as well as for single hosts). This question isn't yet answered, is it? Laurence's suggestions don't work out of the box but would need some development work. I agree with him - especially with "...'run', 'run if no other tasks available' and 'don't run'..." - but would keep it as simple as possible. The scheduler should work random based on #tasks or wall time. Other metrics are either too complex in error handling or are "moving targets" (credits). IIRC a similar suggestion has already been posted in the past. <edit>@Yeti: sorry, saw your post too late :)</edit> ID: 30585 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2745 Credit: 302,486,450 RAC: 75,419	Message 30586 - Posted: 1 Jun 2017, 14:00:46 UTC Addendum: A random #tasks metric would probably prefer wall time controlled singlecore WUs (Theory, LHCb, CMS) over ATLAS. To avoid this ATLAS WUs would have to be designed to run as long (12h as singlecore) as the other subprojects. ID: 30586 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 255,399 RAC: 48	Message 30587 - Posted: 1 Jun 2017, 14:10:30 UTC - in response to Message 30585. When I started this thread I didn't expect that it would grow to a political discussion. It is not political but one that can get difficult very quickly. My goal was to point out a possible technical problem as I thought WUs from all selected subprojects would be distributed at random (in total as well as for single hosts). This question isn't yet answered, is it? No. To us the scheduling is still black magic but we are trying to understand how it is working. ID: 30587 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 255,399 RAC: 48	Message 30588 - Posted: 1 Jun 2017, 14:35:37 UTC - in response to Message 30586. A random #tasks metric would probably prefer wall time controlled singlecore WUs (Theory, LHCb, CMS) over ATLAS. To avoid this ATLAS WUs would have to be designed to run as long (12h as singlecore) as the other subprojects. Yes there are always additional things to consider. Not only do we want to ensure that the resource is being shared fairly but also each application gets their fair share. ID: 30588 · Reply Quote

PHILIPPE Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0	Message 30589 - Posted: 1 Jun 2017, 17:40:05 UTC - in response to Message 30588. My initial purpose was only to make aware to everyone the difficulty with a share algorythm. I am only a small cruncher and i don't want to impose anything.I'm not here for this.I just want to understand what is happening and discuss about it... But reading your response , i try to see how this problem has been solved elsewhere. For instance WCG sites seems to use the laurence 's proposal (choice of sub-projects and random for the sub-projects selected, and different profiles to suit with different computers). Their solution works fine as i can see. I don't know for others big sites projects. Maybe someone can tell us about them. ---------------------------------------------------------------------------------------------------------------------------------------------------------------- But to explore all possible ways , without knowing the feasibility (i admit)... I wonder if it were possible to use the boinc's features to do the share. The fact that all the sub-projects are now grouped makes less easy to manage as in the past. But , if it were possible to have in our web preferences , the share ressources for each subproject and not only the global project himself. This is BOINC which would do the job.No more complexity and headache to find a solution.No further burden for LHC team. A percentage would be given to each sub-project : example:Atlas :200 , Theory :100 , CMS :50 , LHCb :50 , Alice :0 , Benchmark :0. It wouldn't be totally the same as the yeti's proposal but it 's near. And there would be a random fetch of wus , for the selected sub-projects as Jim's proposal. (Sorry to make you repeat twice Jim , but english is not my mother tongue) The options to select applications would not be necessary any more. I guess that this solution can't be only made by LHC team , but why don't you ask to the boinc developpers to allow this feature in their software, for the future... It may be interesting for other big sites too , which haven't time to lose with sharing wus . Imagine all the ... (a dreamer). Important is to keep the faith in ourself , despite the fact we don't have the same level in our knowledge, we can be complementary (professionals and amateurs). Thanks to have read me. ID: 30589 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1980 Credit: 160,774,607 RAC: 40,183	Message 30590 - Posted: 2 Jun 2017, 6:20:36 UTC - in response to Message 30583. We discussed in another thread already, that we need a Project-Priority for the Sub-Projects. Example: Prio 1: Atlas Prio 1: Theory Prio 2: CMS Prio 3: LHCb don't run: Sixtrack, Alice, ... So, if the client requests work, it will get Atlas or Theory-Tasks as long as they are available. If they are not available, it will get CMS. If even CMS doesn't offer work, it gets LHCb. @Laurence - it there a chance this this perfect suggestion will materialize? I know such a system from some other projects, and I guess it should not be too difficult to implement, right? ID: 30590 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 255,399 RAC: 48	Message 30591 - Posted: 2 Jun 2017, 11:28:14 UTC - in response to Message 30589. Last modified: 2 Jun 2017, 11:29:30 UTC Important is to keep the faith in ourself , despite the fact we don't have the same level in our knowledge, we can be complementary (professionals and amateurs). On this topic your input is vital as you are describing how you would like your resource to be shared. We have to then try and consolidate the wishes from everyone into a set of simple rules that describes a policy for the scheduler to follow. A percentage would be given to each sub-project : A percentage is a way to give them relative weight. It just denominates this unit a 100 slices of the pie. The real question is what is the pie? wall time, cpu time, number of tasks and how do things like multi-core jobs affect this? But reading your response , i try to see how this problem has been solved elsewhere. You are right, this is a fair;y standard problem and there are many existing solutions. I guess that this solution can't be only made by LHC team , but why don't you ask to the boinc developpers to allow this feature in their software, for the future... If we can find a general solution then this can be feed back upstream and benefit other projects or if there is a project that already has a solution, they should do this and make it available. ID: 30591 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 255,399 RAC: 48	Message 30592 - Posted: 2 Jun 2017, 11:30:21 UTC - in response to Message 30590. Last modified: 2 Jun 2017, 11:30:31 UTC @Laurence - it there a chance this this perfect suggestion will materialize? I know such a system from some other projects, and I guess it should not be too difficult to implement, right? Yes, if we can agree on what is wanted, we can try to make it happen. ID: 30592 · Reply Quote