Message boards :
ATLAS application :
Only 6 concurrent tasks per computer?
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 ![]() ![]() |
Hi, At the beginning of the week, there was a configuration where "reliable" hosts were given priority for dispatching ATLAS tasks. As a result only one of my 3 hosts were given ATLAS tasks. This issue was discussed in this thread: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4280&postid=32239#32239 Once the server was reconfigured, my 3 hosts went back to their routine: 24 ATLAS tasks pending, as per the LHC@home preferences. However: - Within 1 day, the fastest host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10455317, crunshing usually 9 2-core tasks at the same time, could only have 6 pending tasks. - Today morning, the medium speed host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10416269, crunshing usually 3 2-core tasks at the same time, could also only have 6 pending tasks. - And the slowest host https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10420599, crunshing usually 1 4-core task at a time, is still having 9 pending tasks, but the number of pending tasks is gradually decreasing. A similar observation was posted here as well: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4280&postid=32369#32369 So it seems an infrastructure issue. Could someone from the CERN team have a look? Thanks, Herve We are the product of random evolution. |
![]() ![]() Send message Joined: 2 Sep 04 Posts: 450 Credit: 173,633,071 RAC: 182,678 ![]() ![]() ![]() |
|
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 8 ![]() ![]() |
I've put back the old configuration. The limit was decreased in order to avoid that too many tasks went past the deadline on slow hosts, but we didn't realise it would have this negative effect for good hosts. Sorry for the inconvenience. |
![]() ![]() Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 ![]() ![]() |
Thanks David, All my hosts are back to their usual crunching routine. I will configure the slow host to have a smaller number of pending tasks in order to have all them processed within a day. I am not sure what would be the best setting in order to make sure the ATLAS jobs are processed quickly by the community. Reduce the deadline even further? Regards, Herve We are the product of random evolution. |
Send message Joined: 27 Sep 08 Posts: 753 Credit: 573,573,966 RAC: 248,789 ![]() ![]() ![]() |
I have store at least 0.5days of work and 0.01days of additional work I assume this would not over run the 24hr turn around? Assuming FLOPS estimates are good |
Send message Joined: 6 Jul 17 Posts: 22 Credit: 29,430,354 RAC: 0 ![]() ![]() |
Thanks, all hosts get normal 24 Task. I have store at least 0.5days of work and 0.01days of additional work I assume this would not over run the 24hr turn around? What means 24h return time? Will LHC go to such short return time? With the actual short WU-Runtime (1 - 1.5 hours) all hosts are able to return more than 24 WU's in one day. |
Send message Joined: 27 Sep 08 Posts: 753 Credit: 573,573,966 RAC: 248,789 ![]() ![]() ![]() |
the 24hrs is the time to return the WU, so if you can run 1 at a time them you could have a buffer of 16-24WU's and still be fine. If you had n CPU's then you could do n times this. Since Atlas doesn't really use BOINC's task managment the most WU's you can have at anytime is 24 anyway. In theory the 0.5 work buffer would store an additional 8-12WU's, so you could exceed the 24hr limit. The max of 24 is only irritating to people with more than 24 cores, as they have more cores that the allowed WU limits, a better server setting would be to only allow 24WU/core/day |
Send message Joined: 6 Jul 17 Posts: 22 Credit: 29,430,354 RAC: 0 ![]() ![]() |
the 24hrs is the time to return the WU, so if you can run 1 at a time them you could have a buffer of 16-24WU's and still be fine. Sorry Toby Broom, but i don't understand what you mean. Especially the first sentence. For Boinc Users is time to return the deadline, if you want to short return time you have to short deadline. If CPU Power is not able to crunch latest WU in Buffer than User has to minimize Queue Depth. |
Send message Joined: 27 Sep 08 Posts: 753 Credit: 573,573,966 RAC: 248,789 ![]() ![]() ![]() |
I thought the return deadline was 24hrs but I miss read, sorry. The WU buffer does not work correctly for ATLAS, as you said the user has to manage it manually. Normally BOINC does it for you. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 8 ![]() ![]() |
The WU buffer does not work correctly for ATLAS, as you said the user has to manage it manually. Normally BOINC does it for you. Can you explain what you mean here? We obviously want to make things as easy as possible for volunteers so we should try to fix this issue. |
Send message Joined: 27 Sep 08 Posts: 753 Credit: 573,573,966 RAC: 248,789 ![]() ![]() ![]() |
Using the websettings you have the Max # jobs setting, the options are Unimited and 1-24. If you choose Unlimited then you get 1 WU at a time, the other project here allow BOINC to manage the setting based on your computing preferences. If you choose 1-24 then BOINC will take 1-24 WU's. For example if you choose 24 and you have a 12 core CPU then you will have 24WU's with 12 running. It will then be 1 in 1 out as you process work. I guess the 24 limit isn't so constraining given the ram usage for example on my one computer with 32Cores I could run 32 but with 128GB of ram then I use ca 115GB so I couldn't run more than 24 anyway, you can also work around by using more cores or mutiple instances of BOINC. For me the part that I would perfer is to have 1 preference for all my computers, now I have to have one for the ones runing ATLAS and another for the other projects. I think the original reason for the Job limit was to stop ATLAS using all the cores and all the memory for new users, so you would want to be careful before making any changes. maybe you can make the default websettings 1 Job/1Core? and make the unlimited setting work as per the other sub-projects |
Send message Joined: 27 Sep 08 Posts: 753 Credit: 573,573,966 RAC: 248,789 ![]() ![]() ![]() |
I could only get 3 WU/Host with the unlimited settings, setting back to 8 allowed 8. before it was 24, is there a plan to allow 24 again? |
Send message Joined: 18 Dec 15 Posts: 1571 Credit: 68,513,780 RAC: 172,407 ![]() ![]() ![]() |
I could only get 3 WU/Host with the unlimited settings, setting back to 8 allowed 8.about the same is happening here. Strange. Any explanation from the experts? |
Send message Joined: 2 May 07 Posts: 1761 Credit: 136,505,471 RAC: 12,796 ![]() ![]() ![]() |
What's the rule of 10k Atlas-Tasks concurrently? For example gridcoin-Teams with thousands of Computer or Cluster of CERN-Computer with native-App? Many Computer from normal Volunteer's can't reach the defined limit for a day's work in their own preferences. |
Send message Joined: 27 Sep 08 Posts: 753 Credit: 573,573,966 RAC: 248,789 ![]() ![]() ![]() |
Maybe the 10k is to limit backend issues like we saw reciently? The boinc scheduing will work out how many tasks to buffer so that the limits can be reached. |
Send message Joined: 27 Sep 08 Posts: 753 Credit: 573,573,966 RAC: 248,789 ![]() ![]() ![]() |
I will just leave on 3, I wanted to contribute more to ATLAS but I can get more out of my computer with the settings that are favorable for the other projects. |
Send message Joined: 18 Dec 15 Posts: 1571 Credit: 68,513,780 RAC: 172,407 ![]() ![]() ![]() |
Maybe the 10k is to limit backend issues like we saw reciently?most probably so |
©2023 CERN