Message boards :
News :
CMS@Home up again
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,737,455 RAC: 1,317 |
OK, jobs are available again. Sorry for the long delay. Remember, I'm only the front-man for a larger crew, so any downstream delays percolate up to my response. Hopefully this will remain good for some time, but I still don't understand why the condor server occasionally refuses to send out jobs in a timely manner. |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,823,975 RAC: 15,956 |
Do you know what version of HTCondor is being used? https://research.cs.wisc.edu/htcondor/ |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
All of my CMS just started erroring out. And it is not just the short ones with no work, but some have been running for 2 1/2 hours. Something needs an upgrade again. |
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,737,455 RAC: 1,317 |
Do you know what version of HTCondor is being used? If I do condor_q -v in the VM I get:: $CondorVersion: 8.6.10 Mar 12 2018 BuildID: 435200 $ $CondorPlatform: x86_64_RedHat6 $ but that's not necessarily what's running on vocms0267.cern.ch. I've asked Federica to check for me. |
Send message Joined: 29 Aug 05 Posts: 1060 Credit: 7,737,455 RAC: 1,317 |
Do you know what version of HTCondor is being used? And the answer: [drumroll] cmst1@vocms0267:/data/srv/wmagent/current $ condor_version $CondorVersion: 8.6.8 Nov 13 2017 BuildID: 424045 $ $CondorPlatform: x86_64_RedHat7 $ [/drumroll] I wonder if that's optimal?... |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,823,975 RAC: 15,956 |
Do you know what version of HTCondor is being used? Well thanks for checking that Ivan and it sure is older than I expected and I thought they would keep that up to date at the server. https://research.cs.wisc.edu/htcondor/downloads/ |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 10,589,655 RAC: 2,832 |
Most of my CMS are causing errors again. I'm assuming this is a CERN fault and not my doing. Please let me know if I can adjust anything at this end. Running latest Boinc and Virtualbox under Windows 10. And I'm still not getting Atlas or Theory tasks, despite there being more of those showing as available on the server status page. I'm only being given CMS. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
And I'm still not getting Atlas or Theory tasks, despite there being more of those showing as available on the server status page. I'm only being given CMS. Ivan explained this. When CMS goes out, it takes the other ones with it too. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5298&postid=41543#41543 I think if the LHC staff were paid by the number of BOINC units run, they would think of another way of doing it. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,850,736 RAC: 37,972 |
Ivan wrote about quotas. These are set for each app version independent from other app versions. Just check your computer details page and follow the link to "Application details: Show". This means if you have CMS and ATLAS enabled and CMS fails until your computer's quota is down to 0 then it can still get ATLAS (if available). ATM CMS stopped generation of more subtasks to find out what causes errors in the job submission chain. See: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5309&postid=41635 |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
This means if you have CMS and ATLAS enabled and CMS fails until your computer's quota is down to 0 then it can still get ATLAS (if available). Maybe I am just unlucky. But for some time (including the present), whenever CMS fails then I can't get more of anything else. I am out of native ATLAS at the moment, even though they have all completed successfully. EDIT: Running only native ATLAS (without CMS) usually works for a while. I detached a few hours ago, but reattached and will try again. |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 10,589,655 RAC: 2,832 |
I'd like to know how LHC servers (and other projects) decide what subproject to give you. If I have them all selected, I could understand getting the one with the biggest queue, or maybe first in first out, but with LHC at the moment, I got given loads of CMS and no Atlas or Theory, despite CMS having the least jobs available of the three. Maybe they prioritize one over the other? Maybe if very few people have CMS enabled, those that do just get CMS? |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,850,736 RAC: 37,972 |
The search function might be your friend since this has been explained a couple of times in this message board. The server fills it's ready to send queue from a couple of upstream processes each representing one of the subprojects. Now the server's shared memory holds a list of "results" in random order. In addition large projects like LHC@home spread the load over a couple of servers which are contacted in random order (DNS based load balancing). Your client generates a request to get x seconds of work and the server that answers your request will send you the n first "results" from it's shared memory list. - n is calculated based on the sum of the estimated runtimes. - server side quotas will be respected. - results from deselected subprojects will be skipped*). Under certain circumstances this leads to a situation where one of the servers has no tasks (=result) from your active subprojects in it's queue and you will get a "no tasks available" message although the server status page show lots of available tasks. *) This might lead to the situation that the next client who has this subproject checked will get the skipped "results". |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 10,589,655 RAC: 2,832 |
The search function might be your friend since this has been explained a couple of times in this message board. I suspect that last point is why I only get CMS. A lot of folk have probably turned CMS off due to the problems, and I can see there are less "users in last 24 hours" on the server status page. Hence CMS is probably the first 50 tasks in the queues. I ain't turning my CMS off. I don't care if I get no credit, if it helps them sort out the problems, my computer will try to do them. Some of them work. |
Send message Joined: 28 Sep 04 Posts: 728 Credit: 49,041,081 RAC: 27,114 |
Today I have been getting only Theory on my main cruncher. It has been now accepting work from all subprojects except sixtrack. Yesterday it had all subprojects selected and it got only sixtrack tasks. So no CMS for me although it has also been selected for a couple of days now. |
Send message Joined: 12 Aug 06 Posts: 429 Credit: 10,589,655 RAC: 2,832 |
Today I have been getting only Theory on my main cruncher. It has been now accepting work from all subprojects except sixtrack. Yesterday it had all subprojects selected and it got only sixtrack tasks. So no CMS for me although it has also been selected for a couple of days now. Seems like the server is picking favourites :-) Sixtrack is very short of tasks, server status usually shows 0 available. One of my computers managed to grab 15 of them last night, but that's all. Sixtrack is the only one that will work without virtual machine, so anyone can do it, including mobile phones (which I have two of), and my three antique computers with very old processors and small RAM. |
©2024 CERN