Only 6 concurrent tasks per computer?

Author	Message
HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 32370 - Posted: 9 Sep 2017, 15:21:09 UTC Hi, At the beginning of the week, there was a configuration where "reliable" hosts were given priority for dispatching ATLAS tasks. As a result only one of my 3 hosts were given ATLAS tasks. This issue was discussed in this thread: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4280&postid=32239#32239 Once the server was reconfigured, my 3 hosts went back to their routine: 24 ATLAS tasks pending, as per the LHC@home preferences. However: - Within 1 day, the fastest host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10455317, crunshing usually 9 2-core tasks at the same time, could only have 6 pending tasks. - Today morning, the medium speed host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10416269, crunshing usually 3 2-core tasks at the same time, could also only have 6 pending tasks. - And the slowest host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10420599, crunshing usually 1 4-core task at a time, is still having 9 pending tasks, but the number of pending tasks is gradually decreasing. A similar observation was posted here as well: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4280&postid=32369#32369 So it seems an infrastructure issue. Could someone from the CERN team have a look? Thanks, Herve We are the product of random evolution. ID: 32370 · Reply Quote

Yeti Volunteer moderator Send message Joined: 2 Sep 04 Posts: 468 Credit: 215,197,406 RAC: 2,396	Message 32371 - Posted: 9 Sep 2017, 15:31:48 UTC This is a break of Atlas / LHC-Configuration. We were told that Atlas will respect the LHC-Preferences and I have set mine to have 12 Jobs in the Local-Queue, but actual I see only 6 Jobs in my local queues. David, please fix this. Supporting BOINC, a great concept ! ID: 32371 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 32382 - Posted: 11 Sep 2017, 7:20:18 UTC I've put back the old configuration. The limit was decreased in order to avoid that too many tasks went past the deadline on slow hosts, but we didn't realise it would have this negative effect for good hosts. Sorry for the inconvenience. ID: 32382 · Reply Quote

HerveUAE Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0	Message 32389 - Posted: 11 Sep 2017, 15:24:03 UTC - in response to Message 32382. Thanks David, All my hosts are back to their usual crunching routine. I will configure the slow host to have a smaller number of pending tasks in order to have all them processed within a day. I am not sure what would be the best setting in order to make sure the ATLAS jobs are processed quickly by the community. Reduce the deadline even further? Regards, Herve We are the product of random evolution. ID: 32389 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 888 Credit: 759,725,782 RAC: 351,698	Message 32390 - Posted: 11 Sep 2017, 17:03:04 UTC Last modified: 11 Sep 2017, 17:03:30 UTC I have store at least 0.5days of work and 0.01days of additional work I assume this would not over run the 24hr turn around? Assuming FLOPS estimates are good ID: 32390 · Reply Quote

csbyseti Send message Joined: 6 Jul 17 Posts: 22 Credit: 29,430,354 RAC: 0	Message 32392 - Posted: 11 Sep 2017, 18:54:44 UTC - in response to Message 32390. Thanks, all hosts get normal 24 Task. I have store at least 0.5days of work and 0.01days of additional work I assume this would not over run the 24hr turn around? Assuming FLOPS estimates are good What means 24h return time? Will LHC go to such short return time? With the actual short WU-Runtime (1 - 1.5 hours) all hosts are able to return more than 24 WU's in one day. ID: 32392 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 888 Credit: 759,725,782 RAC: 351,698	Message 32394 - Posted: 12 Sep 2017, 6:05:10 UTC - in response to Message 32392. Last modified: 12 Sep 2017, 6:06:05 UTC the 24hrs is the time to return the WU, so if you can run 1 at a time them you could have a buffer of 16-24WU's and still be fine. If you had n CPU's then you could do n times this. Since Atlas doesn't really use BOINC's task managment the most WU's you can have at anytime is 24 anyway. In theory the 0.5 work buffer would store an additional 8-12WU's, so you could exceed the 24hr limit. The max of 24 is only irritating to people with more than 24 cores, as they have more cores that the allowed WU limits, a better server setting would be to only allow 24WU/core/day ID: 32394 · Reply Quote

csbyseti Send message Joined: 6 Jul 17 Posts: 22 Credit: 29,430,354 RAC: 0	Message 32402 - Posted: 12 Sep 2017, 20:08:30 UTC - in response to Message 32394. Last modified: 12 Sep 2017, 20:44:44 UTC the 24hrs is the time to return the WU, so if you can run 1 at a time them you could have a buffer of 16-24WU's and still be fine. If you had n CPU's then you could do n times this. Since Atlas doesn't really use BOINC's task managment the most WU's you can have at anytime is 24 anyway. In theory the 0.5 work buffer would store an additional 8-12WU's, so you could exceed the 24hr limit. The max of 24 is only irritating to people with more than 24 cores, as they have more cores that the allowed WU limits, a better server setting would be to only allow 24WU/core/day Sorry Toby Broom, but i don't understand what you mean. Especially the first sentence. For Boinc Users is time to return the deadline, if you want to short return time you have to short deadline. If CPU Power is not able to crunch latest WU in Buffer than User has to minimize Queue Depth. ID: 32402 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 888 Credit: 759,725,782 RAC: 351,698	Message 32406 - Posted: 12 Sep 2017, 21:29:23 UTC - in response to Message 32402. I thought the return deadline was 24hrs but I miss read, sorry. The WU buffer does not work correctly for ATLAS, as you said the user has to manage it manually. Normally BOINC does it for you. ID: 32406 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 32410 - Posted: 13 Sep 2017, 7:54:37 UTC - in response to Message 32406. The WU buffer does not work correctly for ATLAS, as you said the user has to manage it manually. Normally BOINC does it for you. Can you explain what you mean here? We obviously want to make things as easy as possible for volunteers so we should try to fix this issue. ID: 32410 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 888 Credit: 759,725,782 RAC: 351,698	Message 32417 - Posted: 13 Sep 2017, 19:17:59 UTC - in response to Message 32410. Last modified: 13 Sep 2017, 19:20:19 UTC Using the websettings you have the Max # jobs setting, the options are Unimited and 1-24. If you choose Unlimited then you get 1 WU at a time, the other project here allow BOINC to manage the setting based on your computing preferences. If you choose 1-24 then BOINC will take 1-24 WU's. For example if you choose 24 and you have a 12 core CPU then you will have 24WU's with 12 running. It will then be 1 in 1 out as you process work. I guess the 24 limit isn't so constraining given the ram usage for example on my one computer with 32Cores I could run 32 but with 128GB of ram then I use ca 115GB so I couldn't run more than 24 anyway, you can also work around by using more cores or mutiple instances of BOINC. For me the part that I would perfer is to have 1 preference for all my computers, now I have to have one for the ones runing ATLAS and another for the other projects. I think the original reason for the Job limit was to stop ATLAS using all the cores and all the memory for new users, so you would want to be careful before making any changes. maybe you can make the default websettings 1 Job/1Core? and make the unlimited setting work as per the other sub-projects ID: 32417 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 888 Credit: 759,725,782 RAC: 351,698	Message 34316 - Posted: 8 Feb 2018, 18:06:03 UTC Last modified: 8 Feb 2018, 18:11:26 UTC I could only get 3 WU/Host with the unlimited settings, setting back to 8 allowed 8. before it was 24, is there a plan to allow 24 again? ID: 34316 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1922 Credit: 148,990,461 RAC: 144,558	Message 34319 - Posted: 9 Feb 2018, 6:01:47 UTC - in response to Message 34316. I could only get 3 WU/Host with the unlimited settings, setting back to 8 allowed 8. about the same is happening here. Strange. Any explanation from the experts? ID: 34319 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2278 Credit: 178,775,457 RAC: 2,811	Message 34325 - Posted: 9 Feb 2018, 9:27:24 UTC What's the rule of 10k Atlas-Tasks concurrently? For example gridcoin-Teams with thousands of Computer or Cluster of CERN-Computer with native-App? Many Computer from normal Volunteer's can't reach the defined limit for a day's work in their own preferences. ID: 34325 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 888 Credit: 759,725,782 RAC: 351,698	Message 34348 - Posted: 9 Feb 2018, 22:03:38 UTC - in response to Message 34325. Maybe the 10k is to limit backend issues like we saw reciently? The boinc scheduing will work out how many tasks to buffer so that the limits can be reached. ID: 34348 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 888 Credit: 759,725,782 RAC: 351,698	Message 34354 - Posted: 10 Feb 2018, 11:39:40 UTC I will just leave on 3, I wanted to contribute more to ATLAS but I can get more out of my computer with the settings that are favorable for the other projects. ID: 34354 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1922 Credit: 148,990,461 RAC: 144,558	Message 34356 - Posted: 10 Feb 2018, 12:31:30 UTC - in response to Message 34348. Maybe the 10k is to limit backend issues like we saw reciently? most probably so ID: 34356 · Reply Quote

LHC@home