log in

Only 6 concurrent tasks per computer?


Advanced search

Message boards : ATLAS application : Only 6 concurrent tasks per computer?

Author Message
Profile HerveUAE
Avatar
Send message
Joined: 18 Dec 16
Posts: 101
Credit: 5,269,439
RAC: 25,064
Message 32370 - Posted: 9 Sep 2017, 15:21:09 UTC

Hi,
At the beginning of the week, there was a configuration where "reliable" hosts were given priority for dispatching ATLAS tasks. As a result only one of my 3 hosts were given ATLAS tasks. This issue was discussed in this thread:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4280&postid=32239#32239
Once the server was reconfigured, my 3 hosts went back to their routine: 24 ATLAS tasks pending, as per the LHC@home preferences.

However:
- Within 1 day, the fastest host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10455317, crunshing usually 9 2-core tasks at the same time, could only have 6 pending tasks.
- Today morning, the medium speed host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10416269, crunshing usually 3 2-core tasks at the same time, could also only have 6 pending tasks.
- And the slowest host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10420599, crunshing usually 1 4-core task at a time, is still having 9 pending tasks, but the number of pending tasks is gradually decreasing.

A similar observation was posted here as well: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4280&postid=32369#32369
So it seems an infrastructure issue. Could someone from the CERN team have a look?

Thanks,
Herve
____________
We are the product of random evolution.

Profile Yeti
Volunteer moderator
Avatar
Send message
Joined: 2 Sep 04
Posts: 281
Credit: 41,059,103
RAC: 50,618
Message 32371 - Posted: 9 Sep 2017, 15:31:48 UTC

This is a break of Atlas / LHC-Configuration.

We were told that Atlas will respect the LHC-Preferences and I have set mine to have 12 Jobs in the Local-Queue, but actual I see only 6 Jobs in my local queues.

David, please fix this.
____________


Supporting BOINC, a great concept !

David Cameron
Project administrator
Project developer
Project scientist
Send message
Joined: 13 May 14
Posts: 124
Credit: 2,875,749
RAC: 10,318
Message 32382 - Posted: 11 Sep 2017, 7:20:18 UTC

I've put back the old configuration. The limit was decreased in order to avoid that too many tasks went past the deadline on slow hosts, but we didn't realise it would have this negative effect for good hosts. Sorry for the inconvenience.

Profile HerveUAE
Avatar
Send message
Joined: 18 Dec 16
Posts: 101
Credit: 5,269,439
RAC: 25,064
Message 32389 - Posted: 11 Sep 2017, 15:24:03 UTC - in response to Message 32382.

Thanks David,
All my hosts are back to their usual crunching routine. I will configure the slow host to have a smaller number of pending tasks in order to have all them processed within a day.
I am not sure what would be the best setting in order to make sure the ATLAS jobs are processed quickly by the community. Reduce the deadline even further?
Regards,
Herve
____________
We are the product of random evolution.

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 358
Credit: 78,287,504
RAC: 112,390
Message 32390 - Posted: 11 Sep 2017, 17:03:04 UTC
Last modified: 11 Sep 2017, 17:03:30 UTC

I have store at least 0.5days of work and 0.01days of additional work I assume this would not over run the 24hr turn around?

Assuming FLOPS estimates are good

csbyseti
Send message
Joined: 6 Jul 17
Posts: 3
Credit: 695,922
RAC: 24,828
Message 32392 - Posted: 11 Sep 2017, 18:54:44 UTC - in response to Message 32390.

Thanks, all hosts get normal 24 Task.

I have store at least 0.5days of work and 0.01days of additional work I assume this would not over run the 24hr turn around?

Assuming FLOPS estimates are good


What means 24h return time? Will LHC go to such short return time?

With the actual short WU-Runtime (1 - 1.5 hours) all hosts are able to return more than 24 WU's in one day.

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 358
Credit: 78,287,504
RAC: 112,390
Message 32394 - Posted: 12 Sep 2017, 6:05:10 UTC - in response to Message 32392.
Last modified: 12 Sep 2017, 6:06:05 UTC

the 24hrs is the time to return the WU, so if you can run 1 at a time them you could have a buffer of 16-24WU's and still be fine.

If you had n CPU's then you could do n times this.

Since Atlas doesn't really use BOINC's task managment the most WU's you can have at anytime is 24 anyway.

In theory the 0.5 work buffer would store an additional 8-12WU's, so you could exceed the 24hr limit.

The max of 24 is only irritating to people with more than 24 cores, as they have more cores that the allowed WU limits, a better server setting would be to only allow 24WU/core/day

csbyseti
Send message
Joined: 6 Jul 17
Posts: 3
Credit: 695,922
RAC: 24,828
Message 32402 - Posted: 12 Sep 2017, 20:08:30 UTC - in response to Message 32394.
Last modified: 12 Sep 2017, 20:44:44 UTC

the 24hrs is the time to return the WU, so if you can run 1 at a time them you could have a buffer of 16-24WU's and still be fine.

If you had n CPU's then you could do n times this.

Since Atlas doesn't really use BOINC's task managment the most WU's you can have at anytime is 24 anyway.

In theory the 0.5 work buffer would store an additional 8-12WU's, so you could exceed the 24hr limit.

The max of 24 is only irritating to people with more than 24 cores, as they have more cores that the allowed WU limits, a better server setting would be to only allow 24WU/core/day


Sorry Toby Broom, but i don't understand what you mean. Especially the first sentence.
For Boinc Users is time to return the deadline, if you want to short return time you have to short deadline.
If CPU Power is not able to crunch latest WU in Buffer than User has to minimize Queue Depth.

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 358
Credit: 78,287,504
RAC: 112,390
Message 32406 - Posted: 12 Sep 2017, 21:29:23 UTC - in response to Message 32402.

I thought the return deadline was 24hrs but I miss read, sorry.

The WU buffer does not work correctly for ATLAS, as you said the user has to manage it manually. Normally BOINC does it for you.

David Cameron
Project administrator
Project developer
Project scientist
Send message
Joined: 13 May 14
Posts: 124
Credit: 2,875,749
RAC: 10,318
Message 32410 - Posted: 13 Sep 2017, 7:54:37 UTC - in response to Message 32406.

The WU buffer does not work correctly for ATLAS, as you said the user has to manage it manually. Normally BOINC does it for you.


Can you explain what you mean here? We obviously want to make things as easy as possible for volunteers so we should try to fix this issue.

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 358
Credit: 78,287,504
RAC: 112,390
Message 32417 - Posted: 13 Sep 2017, 19:17:59 UTC - in response to Message 32410.
Last modified: 13 Sep 2017, 19:20:19 UTC

Using the websettings you have the Max # jobs setting, the options are Unimited and 1-24.

If you choose Unlimited then you get 1 WU at a time, the other project here allow BOINC to manage the setting based on your computing preferences.

If you choose 1-24 then BOINC will take 1-24 WU's. For example if you choose 24 and you have a 12 core CPU then you will have 24WU's with 12 running. It will then be 1 in 1 out as you process work.

I guess the 24 limit isn't so constraining given the ram usage for example on my one computer with 32Cores I could run 32 but with 128GB of ram then I use ca 115GB so I couldn't run more than 24 anyway, you can also work around by using more cores or mutiple instances of BOINC.

For me the part that I would perfer is to have 1 preference for all my computers, now I have to have one for the ones runing ATLAS and another for the other projects.

I think the original reason for the Job limit was to stop ATLAS using all the cores and all the memory for new users, so you would want to be careful before making any changes. maybe you can make the default websettings 1 Job/1Core? and make the unlimited setting work as per the other sub-projects

Message boards : ATLAS application : Only 6 concurrent tasks per computer?