Message boards : Number crunching : Specific Job amount per Task as Configuration Option
Message board moderation

To post messages, you must log in.

AuthorMessage
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 32875 - Posted: 22 Oct 2017, 15:23:21 UTC

Hello!

Just an idea to give the users a little bit more control on WU/task duration:

As far as i understand how Theory, CMS and LHCb tasks work is that they start the VM and within the VM there will be a couple of Jobs (condor, wmagent) processed (hope that is kinda correct).

How about to add an configuration option in the LHC preference page (smilar to how much cpu cores should be used, etc.) to choose the amount of these jobs that are processed by one task. This will give users control on how long one tasks will need to complete (with a little bit of varieties due to different job types):

Those who have dedicated PCs for crunching for example could set a high number of jobs which will lead to higher effficiency (because the VM start, stop, etc. wont happen that often).
Those who want to spend less cpu time for boinc this option would also be beneficial (due to lower task duration) and could also help to avoid the "suspend of tasks error".

I dont know how difficult this would be to implement (if even possible), just an idea.

Would be nice to hear about your thoughts about that feature.
ID: 32875 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 997
Credit: 6,264,307
RAC: 71
Message 32877 - Posted: 22 Oct 2017, 15:56:59 UTC - in response to Message 32875.  

To flesh out your understanding a bit:
CMS@Home, which in some ways was a prototype for some of the other projects, runs "tasks" under BOINC. In the background is an HTCondor server which is provisioned by a WMAgent work-flow-management system. A WMAgent controller (in this case me, tho' Federica occasionally submits jobs too) sets up a batch of "jobs" which WMAgent then submits to the HTCondor server.
The BOINC tasks poll the HTCondor server for jobs, and run them to completion. If a task has been running for less than 12 hours, it then asks for another job, and so on. There is a hard cut-off at 18 hours (IIRC) to avoid runaway jobs, because we aim for jobs that take less than ~2 hours and return less than ~100 MiB.

In the -dev project we are trying to perfect VM tasks which run more than one job. So there you can, for example, set your computing preferences on an 8-core machine to run two tasks, each running four jobs simultaneously. I must admit that, due to various constraints, I find it most efficient to run two-core tasks there.
ID: 32877 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 32881 - Posted: 23 Oct 2017, 7:44:20 UTC

thanks for the information Ivan!

...The BOINC tasks poll the HTCondor server for jobs, and run them to completion. If a task has been running for less than 12 hours, it then asks for another job, and so on...

Why 12 hours? Why not make the amount of jobs the BOINC tasks get from HTCondor user optional/configurable to get the described benefits?
ID: 32881 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 997
Credit: 6,264,307
RAC: 71
Message 32882 - Posted: 23 Oct 2017, 9:35:55 UTC - in response to Message 32881.  

thanks for the information Ivan!

...The BOINC tasks poll the HTCondor server for jobs, and run them to completion. If a task has been running for less than 12 hours, it then asks for another job, and so on...

Why 12 hours? Why not make the amount of jobs the BOINC tasks get from HTCondor user optional/configurable to get the described benefits?

Good question! I guess at the time it was seen as optimum, perhaps it's different now. Our recovery from shut-down tasks is still not 100% satisfactory, in my experience -- even less so if the computer is just switched off without BOINC being properly stopped and time given for the VMs to save their state to disk... So we need to decide a balance between the optimum performance and the chance of tasks being lost through premature termination. Allowing the volunteers to decide how long this should be is a laudable goal, but unfortunately we can't rely on all users to be knowledgeable enough to configure this satisfactorily. Perhaps a default with an option to over-ride?
ID: 32882 · Report as offensive     Reply Quote
gyllic

Send message
Joined: 9 Dec 14
Posts: 202
Credit: 2,533,875
RAC: 0
Message 32884 - Posted: 23 Oct 2017, 15:53:55 UTC - in response to Message 32882.  

...Allowing the volunteers to decide how long this should be is a laudable goal, but unfortunately we can't rely on all users to be knowledgeable enough to configure this satisfactorily.

Yes, good point!

Perhaps a default with an option to over-ride?

This would be a good solution!
ID: 32884 · Report as offensive     Reply Quote

Message boards : Number crunching : Specific Job amount per Task as Configuration Option


©2024 CERN