Message boards : ATLAS application : Only 6 concurrent tasks per computer?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 32370 - Posted: 9 Sep 2017, 15:21:09 UTC

Hi,
At the beginning of the week, there was a configuration where "reliable" hosts were given priority for dispatching ATLAS tasks. As a result only one of my 3 hosts were given ATLAS tasks. This issue was discussed in this thread:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4280&postid=32239#32239
Once the server was reconfigured, my 3 hosts went back to their routine: 24 ATLAS tasks pending, as per the LHC@home preferences.

However:
- Within 1 day, the fastest host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10455317, crunshing usually 9 2-core tasks at the same time, could only have 6 pending tasks.
- Today morning, the medium speed host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10416269, crunshing usually 3 2-core tasks at the same time, could also only have 6 pending tasks.
- And the slowest host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10420599, crunshing usually 1 4-core task at a time, is still having 9 pending tasks, but the number of pending tasks is gradually decreasing.

A similar observation was posted here as well: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4280&postid=32369#32369
So it seems an infrastructure issue. Could someone from the CERN team have a look?

Thanks,
Herve
We are the product of random evolution.
ID: 32370 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 32371 - Posted: 9 Sep 2017, 15:31:48 UTC

This is a break of Atlas / LHC-Configuration.

We were told that Atlas will respect the LHC-Preferences and I have set mine to have 12 Jobs in the Local-Queue, but actual I see only 6 Jobs in my local queues.

David, please fix this.


Supporting BOINC, a great concept !
ID: 32371 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 32382 - Posted: 11 Sep 2017, 7:20:18 UTC

I've put back the old configuration. The limit was decreased in order to avoid that too many tasks went past the deadline on slow hosts, but we didn't realise it would have this negative effect for good hosts. Sorry for the inconvenience.
ID: 32382 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 32389 - Posted: 11 Sep 2017, 15:24:03 UTC - in response to Message 32382.  

Thanks David,
All my hosts are back to their usual crunching routine. I will configure the slow host to have a smaller number of pending tasks in order to have all them processed within a day.
I am not sure what would be the best setting in order to make sure the ATLAS jobs are processed quickly by the community. Reduce the deadline even further?
Regards,
Herve
We are the product of random evolution.
ID: 32389 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,700,113
RAC: 234,699
Message 32390 - Posted: 11 Sep 2017, 17:03:04 UTC
Last modified: 11 Sep 2017, 17:03:30 UTC

I have store at least 0.5days of work and 0.01days of additional work I assume this would not over run the 24hr turn around?

Assuming FLOPS estimates are good
ID: 32390 · Report as offensive     Reply Quote
csbyseti

Send message
Joined: 6 Jul 17
Posts: 22
Credit: 29,430,354
RAC: 0
Message 32392 - Posted: 11 Sep 2017, 18:54:44 UTC - in response to Message 32390.  

Thanks, all hosts get normal 24 Task.

I have store at least 0.5days of work and 0.01days of additional work I assume this would not over run the 24hr turn around?

Assuming FLOPS estimates are good


What means 24h return time? Will LHC go to such short return time?

With the actual short WU-Runtime (1 - 1.5 hours) all hosts are able to return more than 24 WU's in one day.
ID: 32392 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,700,113
RAC: 234,699
Message 32394 - Posted: 12 Sep 2017, 6:05:10 UTC - in response to Message 32392.  
Last modified: 12 Sep 2017, 6:06:05 UTC

the 24hrs is the time to return the WU, so if you can run 1 at a time them you could have a buffer of 16-24WU's and still be fine.

If you had n CPU's then you could do n times this.

Since Atlas doesn't really use BOINC's task managment the most WU's you can have at anytime is 24 anyway.

In theory the 0.5 work buffer would store an additional 8-12WU's, so you could exceed the 24hr limit.

The max of 24 is only irritating to people with more than 24 cores, as they have more cores that the allowed WU limits, a better server setting would be to only allow 24WU/core/day
ID: 32394 · Report as offensive     Reply Quote
csbyseti

Send message
Joined: 6 Jul 17
Posts: 22
Credit: 29,430,354
RAC: 0
Message 32402 - Posted: 12 Sep 2017, 20:08:30 UTC - in response to Message 32394.  
Last modified: 12 Sep 2017, 20:44:44 UTC

the 24hrs is the time to return the WU, so if you can run 1 at a time them you could have a buffer of 16-24WU's and still be fine.

If you had n CPU's then you could do n times this.

Since Atlas doesn't really use BOINC's task managment the most WU's you can have at anytime is 24 anyway.

In theory the 0.5 work buffer would store an additional 8-12WU's, so you could exceed the 24hr limit.

The max of 24 is only irritating to people with more than 24 cores, as they have more cores that the allowed WU limits, a better server setting would be to only allow 24WU/core/day


Sorry Toby Broom, but i don't understand what you mean. Especially the first sentence.
For Boinc Users is time to return the deadline, if you want to short return time you have to short deadline.
If CPU Power is not able to crunch latest WU in Buffer than User has to minimize Queue Depth.
ID: 32402 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,700,113
RAC: 234,699
Message 32406 - Posted: 12 Sep 2017, 21:29:23 UTC - in response to Message 32402.  

I thought the return deadline was 24hrs but I miss read, sorry.

The WU buffer does not work correctly for ATLAS, as you said the user has to manage it manually. Normally BOINC does it for you.
ID: 32406 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 32410 - Posted: 13 Sep 2017, 7:54:37 UTC - in response to Message 32406.  

The WU buffer does not work correctly for ATLAS, as you said the user has to manage it manually. Normally BOINC does it for you.


Can you explain what you mean here? We obviously want to make things as easy as possible for volunteers so we should try to fix this issue.
ID: 32410 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,700,113
RAC: 234,699
Message 32417 - Posted: 13 Sep 2017, 19:17:59 UTC - in response to Message 32410.  
Last modified: 13 Sep 2017, 19:20:19 UTC

Using the websettings you have the Max # jobs setting, the options are Unimited and 1-24.

If you choose Unlimited then you get 1 WU at a time, the other project here allow BOINC to manage the setting based on your computing preferences.

If you choose 1-24 then BOINC will take 1-24 WU's. For example if you choose 24 and you have a 12 core CPU then you will have 24WU's with 12 running. It will then be 1 in 1 out as you process work.

I guess the 24 limit isn't so constraining given the ram usage for example on my one computer with 32Cores I could run 32 but with 128GB of ram then I use ca 115GB so I couldn't run more than 24 anyway, you can also work around by using more cores or mutiple instances of BOINC.

For me the part that I would perfer is to have 1 preference for all my computers, now I have to have one for the ones runing ATLAS and another for the other projects.

I think the original reason for the Job limit was to stop ATLAS using all the cores and all the memory for new users, so you would want to be careful before making any changes. maybe you can make the default websettings 1 Job/1Core? and make the unlimited setting work as per the other sub-projects
ID: 32417 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,700,113
RAC: 234,699
Message 34316 - Posted: 8 Feb 2018, 18:06:03 UTC
Last modified: 8 Feb 2018, 18:11:26 UTC

I could only get 3 WU/Host with the unlimited settings, setting back to 8 allowed 8.

before it was 24, is there a plan to allow 24 again?
ID: 34316 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,348,426
RAC: 101,752
Message 34319 - Posted: 9 Feb 2018, 6:01:47 UTC - in response to Message 34316.  

I could only get 3 WU/Host with the unlimited settings, setting back to 8 allowed 8.
about the same is happening here. Strange.
Any explanation from the experts?
ID: 34319 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,087,548
RAC: 104,107
Message 34325 - Posted: 9 Feb 2018, 9:27:24 UTC

What's the rule of 10k Atlas-Tasks concurrently?
For example gridcoin-Teams with thousands of Computer or Cluster of CERN-Computer with native-App?
Many Computer from normal Volunteer's can't reach the defined limit for a day's work in their own preferences.
ID: 34325 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,700,113
RAC: 234,699
Message 34348 - Posted: 9 Feb 2018, 22:03:38 UTC - in response to Message 34325.  

Maybe the 10k is to limit backend issues like we saw reciently?

The boinc scheduing will work out how many tasks to buffer so that the limits can be reached.
ID: 34348 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,700,113
RAC: 234,699
Message 34354 - Posted: 10 Feb 2018, 11:39:40 UTC

I will just leave on 3, I wanted to contribute more to ATLAS but I can get more out of my computer with the settings that are favorable for the other projects.
ID: 34354 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,348,426
RAC: 101,752
Message 34356 - Posted: 10 Feb 2018, 12:31:30 UTC - in response to Message 34348.  

Maybe the 10k is to limit backend issues like we saw reciently?
most probably so
ID: 34356 · Report as offensive     Reply Quote

Message boards : ATLAS application : Only 6 concurrent tasks per computer?


©2024 CERN