Message boards : Number crunching : Imbalance between Subprojects
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,922,949
RAC: 137,890
Message 30542 - Posted: 29 May 2017, 8:50:19 UTC

I lately noticed a significant imbalance between the #of calculated WUs for each subproject on my hosts.

        #        %
Theory	40	 45
ATLAS	26	 29
LHCb	16	 18
CMS	 7	  8
Total   89      100


It seems that the server's scheduler does not send the WUs in random order.
Any comments from the project people?
ID: 30542 · Report as offensive     Reply Quote
nikogianna

Send message
Joined: 30 Jan 17
Posts: 7
Credit: 132,213
RAC: 0
Message 30544 - Posted: 29 May 2017, 13:31:18 UTC - in response to Message 30542.  

We are going to start investigating. This is not the intended behavior.
ID: 30544 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,922,949
RAC: 137,890
Message 30547 - Posted: 29 May 2017, 16:13:40 UTC - in response to Message 30544.  

We are going to start investigating. This is not the intended behavior.

Thank you.
ID: 30547 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,723,633
RAC: 234,266
Message 30553 - Posted: 29 May 2017, 20:47:41 UTC
Last modified: 29 May 2017, 20:53:04 UTC

Mine is quite different.

    # %
    Theory 121 51
    ATLAS 7 3
    LHCb 90 38
    CMS 21 9



This is queued and running.

ID: 30553 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 30563 - Posted: 30 May 2017, 17:04:59 UTC - in response to Message 30553.  

Thinking over this problem ,

what is the best option to balance wus for sub-projects ?

There are different ways to manage it :

1°) by number of tasks validated for each sub-project
2°) by time elapsed for each sub-project
3°) by credits earned for each sub-project
4°) by number of wus available for each sub-project

if you choose 1°),
if the wus have different duration ,it can distort the effect intended.
it would give advantage for the sub-project which has the longer wu in duration
if you choose 2°),
if the wus have different duration , it can distort the effect intended.
it wouls give advantage for the sub-project which has the shorter wu in duration
if you choose 3°),
if the wus get credit differently , it can distort the effect intended.
it would give advantage for the sub-project which grants at the lower level the wu validated.
if you choose 4°),
if the wus offered by server are missing during a period , it can distort the effect intended.
it would give advantage for the sub-project which has the best reliability in sending jobs (thought to week-end vmagent failure).

Finally , it's not so easy to satisfy everybody (sub-project and volunteers).

Maybe , harmonization in duration of sub-project wus and credits granted should be improved to provide a better satisfaction ...
ID: 30563 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 30564 - Posted: 30 May 2017, 19:33:40 UTC - in response to Message 30563.  

What is "balance", and why? It really depends on the science they want to do (and maybe their own political infighting). But it is not our problem insofar as I can see. Why should we want an equal number, however measured? Maybe some projects can wait for months while others are needed more urgently. Sub-atomic physics is not about social equality insofar as I know.
ID: 30564 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 30565 - Posted: 30 May 2017, 20:53:33 UTC - in response to Message 30564.  

So in the way , you think , only a human arbitration , made by the manager of each sub-project can decide the part of calculation affected on their own sub-project ,during a defined period.
If a volunteer wants to change this scientific willingness, he has to specify only the project he wants to run.

To sum up , a volunteer with all sub-projects choosen let the real choice to the CERN team scientific.If the volunteer decides to rule as he wants; he can only remove some sub-project that he doesn't want to run anymore in a timely fashion.

It's an other way of scheduling...

Each part of the puzzle with a different responsability:
Sientific with the direction to go and volunteers with the motor to advance.

All is possible...
But the solution choosen has to be accepted by most of us.

Egality is a rather good concept , but Efficiency, too.

We are close to a political decision , difficult to understand when we don't have all the elements in our hands.

Whatever it happens ,everyone can give his opinion and his feedback.

I know people is interested by this theme.
ID: 30565 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 30566 - Posted: 30 May 2017, 21:24:28 UTC - in response to Message 30565.  

If you had read what I had written, you would know that I propose letting CERN handle it any way they want. But perhaps more cogently, I think that the crunchers on this project won't do better, but will just impose arbitrary criteria to suit their whims. It will just add an additional burden on the scheduling operation, however it is performed, and will have nothing to do with the physics at all.
ID: 30566 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 30576 - Posted: 31 May 2017, 16:18:25 UTC - in response to Message 30544.  

We are going to start investigating. This is not the intended behavior.

I'm looking forward what your investigations will bring into the light ...


Supporting BOINC, a great concept !
ID: 30576 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,723,633
RAC: 234,266
Message 30579 - Posted: 31 May 2017, 20:57:07 UTC

I agree with Jim, just let the scientist decide what's best.
ID: 30579 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 30582 - Posted: 1 Jun 2017, 12:44:33 UTC - in response to Message 30579.  
Last modified: 1 Jun 2017, 12:45:02 UTC

There are a number of different perspectives that the scheduling algorithm needs to consider. One is how you would like your host to be used. Already there is the ability to select which applications you would like to run. The next step is how you would like those applications to share the host. This comes down to what rules are defined in a policy that the scheduler has to consider. The simplest rule is to select an application at random. Next we can define any number of rules based on available metrics. However, as we try to optimize, both in terms of sharing the resources and making many people happy, this increases complexity so we have to get the trade-off right.

One improvement could be adding a weighted value for the applications so that some would be preferred over another. Whether the metric used is tasks, wall time, cpu time, or credit can be discussed. But this will add complexity.

My preference would be the triage applications as 'run', 'run if no other tasks available' and 'don't run'. In terms of task selection or each category, it would be random.
ID: 30582 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 30583 - Posted: 1 Jun 2017, 13:08:00 UTC - in response to Message 30582.  
Last modified: 1 Jun 2017, 13:08:33 UTC

My preference would be the triage applications as 'run', 'run if no other tasks available' and 'don't run'. In terms of task selection or each category, it would be random.

We discussed in another thread already, that we need a Project-Priority for the Sub-Projects.

Example:

Prio 1: Atlas
Prio 1: Theory
Prio 2: CMS
Prio 3: LHCb

don't run: Sixtrack, Alice, ...

So, if the client requests work, it will get Atlas or Theory-Tasks as long as they are available. If they are not available, it will get CMS. If even CMS doesn't offer work, it gets LHCb.


Supporting BOINC, a great concept !
ID: 30583 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,922,949
RAC: 137,890
Message 30585 - Posted: 1 Jun 2017, 13:40:36 UTC - in response to Message 30582.  
Last modified: 1 Jun 2017, 13:42:53 UTC

When I started this thread I didn't expect that it would grow to a political discussion.
My goal was to point out a possible technical problem as I thought WUs from all selected subprojects would be distributed at random (in total as well as for single hosts).
This question isn't yet answered, is it?

Laurence's suggestions don't work out of the box but would need some development work.
I agree with him - especially with "...'run', 'run if no other tasks available' and 'don't run'..." - but would keep it as simple as possible.
The scheduler should work random based on #tasks or wall time.
Other metrics are either too complex in error handling or are "moving targets" (credits).

IIRC a similar suggestion has already been posted in the past.

<edit>@Yeti: sorry, saw your post too late :)</edit>
ID: 30585 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,922,949
RAC: 137,890
Message 30586 - Posted: 1 Jun 2017, 14:00:46 UTC

Addendum:

A random #tasks metric would probably prefer wall time controlled singlecore WUs (Theory, LHCb, CMS) over ATLAS.
To avoid this ATLAS WUs would have to be designed to run as long (12h as singlecore) as the other subprojects.
ID: 30586 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 30587 - Posted: 1 Jun 2017, 14:10:30 UTC - in response to Message 30585.  

When I started this thread I didn't expect that it would grow to a political discussion.


It is not political but one that can get difficult very quickly.


My goal was to point out a possible technical problem as I thought WUs from all selected subprojects would be distributed at random (in total as well as for single hosts).
This question isn't yet answered, is it?


No. To us the scheduling is still black magic but we are trying to understand how it is working.
ID: 30587 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 30588 - Posted: 1 Jun 2017, 14:35:37 UTC - in response to Message 30586.  


A random #tasks metric would probably prefer wall time controlled singlecore WUs (Theory, LHCb, CMS) over ATLAS.
To avoid this ATLAS WUs would have to be designed to run as long (12h as singlecore) as the other subprojects.


Yes there are always additional things to consider. Not only do we want to ensure that the resource is being shared fairly but also each application gets their fair share.
ID: 30588 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 30589 - Posted: 1 Jun 2017, 17:40:05 UTC - in response to Message 30588.  

My initial purpose was only to make aware to everyone the difficulty with a share algorythm.
I am only a small cruncher and i don't want to impose anything.I'm not here for this.I just want to understand what is happening and discuss about it...
But reading your response , i try to see how this problem has been solved elsewhere.
For instance WCG sites seems to use the laurence 's proposal (choice of sub-projects and random for the sub-projects selected, and different profiles to suit with different computers).
Their solution works fine as i can see.
I don't know for others big sites projects.
Maybe someone can tell us about them.
----------------------------------------------------------------------------------------------------------------------------------------------------------------
But to explore all possible ways , without knowing the feasibility (i admit)...
I wonder if it were possible to use the boinc's features to do the share.
The fact that all the sub-projects are now grouped makes less easy to manage as in the past.
But , if it were possible to have in our web preferences , the share ressources for each subproject and not only the global project himself.
This is BOINC which would do the job.No more complexity and headache to find a solution.No further burden for LHC team.

A percentage would be given to each sub-project :
example:Atlas :200 , Theory :100 , CMS :50 , LHCb :50 , Alice :0 , Benchmark :0.
It wouldn't be totally the same as the yeti's proposal but it 's near.
And there would be a random fetch of wus , for the selected sub-projects as Jim's proposal.
(Sorry to make you repeat twice Jim , but english is not my mother tongue)

The options to select applications would not be necessary any more.

I guess that this solution can't be only made by LHC team , but why don't you ask to the boinc developpers to allow this feature in their software, for the future...

It may be interesting for other big sites too , which haven't time to lose with sharing wus .

Imagine all the ... (a dreamer).

Important is to keep the faith in ourself , despite the fact we don't have the same level in our knowledge, we can be complementary (professionals and amateurs).

Thanks to have read me.
ID: 30589 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,360,600
RAC: 101,747
Message 30590 - Posted: 2 Jun 2017, 6:20:36 UTC - in response to Message 30583.  

We discussed in another thread already, that we need a Project-Priority for the Sub-Projects.

Example:

Prio 1: Atlas
Prio 1: Theory
Prio 2: CMS
Prio 3: LHCb

don't run: Sixtrack, Alice, ...

So, if the client requests work, it will get Atlas or Theory-Tasks as long as they are available. If they are not available, it will get CMS. If even CMS doesn't offer work, it gets LHCb.


@Laurence - it there a chance this this perfect suggestion will materialize?
I know such a system from some other projects, and I guess it should not be too difficult to implement, right?
ID: 30590 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 30591 - Posted: 2 Jun 2017, 11:28:14 UTC - in response to Message 30589.  
Last modified: 2 Jun 2017, 11:29:30 UTC

Important is to keep the faith in ourself , despite the fact we don't have the same level in our knowledge, we can be complementary (professionals and amateurs).


On this topic your input is vital as you are describing how you would like your resource to be shared. We have to then try and consolidate the wishes from everyone into a set of simple rules that describes a policy for the scheduler to follow.

A percentage would be given to each sub-project :

A percentage is a way to give them relative weight. It just denominates this unit a 100 slices of the pie. The real question is what is the pie? wall time, cpu time, number of tasks and how do things like multi-core jobs affect this?

But reading your response , i try to see how this problem has been solved elsewhere.

You are right, this is a fair;y standard problem and there are many existing solutions.


I guess that this solution can't be only made by LHC team , but why don't you ask to the boinc developpers to allow this feature in their software, for the future...


If we can find a general solution then this can be feed back upstream and benefit other projects or if there is a project that already has a solution, they should do this and make it available.
ID: 30591 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 30592 - Posted: 2 Jun 2017, 11:30:21 UTC - in response to Message 30590.  
Last modified: 2 Jun 2017, 11:30:31 UTC



@Laurence - it there a chance this this perfect suggestion will materialize?
I know such a system from some other projects, and I guess it should not be too difficult to implement, right?


Yes, if we can agree on what is wanted, we can try to make it happen.
ID: 30592 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Imbalance between Subprojects


©2024 CERN