I am sent just ATLAS tasks

Author	Message
jimwilkins Send message Joined: 14 Oct 11 Posts: 2 Credit: 941,643 RAC: 0	Message 32888 - Posted: 23 Oct 2017, 21:09:06 UTC Hi, Although I have all the experiments checked, I am only getting ATLAS tasks. That's OK with me, but is that a potential issue with the scheduler? Jim ID: 32888 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 32889 - Posted: 23 Oct 2017, 21:13:30 UTC - in response to Message 32888. That is the same with me on one machine where I have all the tasks selected. One another machine, I do not have ATLAS selected and am getting CMS, LHCb and Theory just fine. So I think they are just putting a higher priority on ATLAS now, probably in preparation for the upgrade of their accelerator. ID: 32889 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1126 Credit: 11,413,494 RAC: 41,469	Message 32890 - Posted: 23 Oct 2017, 21:27:55 UTC - in response to Message 32889. That is the same with me on one machine where I have all the tasks selected. One another machine, I do not have ATLAS selected and am getting CMS, LHCb and Theory just fine. So I think they are just putting a higher priority on ATLAS now, probably in preparation for the upgrade of their accelerator. I'm not aware of any priority being placed on any one application. However, BOINC priorities are arcane and often counterintuitive. :-( BTW, which upgrade are you talking about? The one for ~2019 or the one for ~2027? I'm heavily involved in the design studies for the second, despite the fact that experts tell me I'll be blind by then -- and almost certainly retired or dead! ID: 32890 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 32891 - Posted: 23 Oct 2017, 22:29:12 UTC - in response to Message 32890. BTW, which upgrade are you talking about? The one for ~2019 or the one for ~2027? I'm heavily involved in the design studies for the second, despite the fact that experts tell me I'll be blind by then -- and almost certainly retired or dead! It must be the one for 2019, though I remembered it as being in 2018. I assumed it was the one they were given so much work to SixTrack for, but have not seen that actually stated yet. (If you are not here, the one for 2027 won't go anyway.) ID: 32891 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1126 Credit: 11,413,494 RAC: 41,469	Message 32895 - Posted: 24 Oct 2017, 8:01:06 UTC - in response to Message 32891. BTW, which upgrade are you talking about? The one for ~2019 or the one for ~2027? I'm heavily involved in the design studies for the second, despite the fact that experts tell me I'll be blind by then -- and almost certainly retired or dead! It must be the one for 2019, though I remembered it as being in 2018. I assumed it was the one they were given so much work to SixTrack for, but have not seen that actually stated yet. I think it was supposed to be 2018, but the schedule has slipped by a year. (If you are not here, the one for 2027 won't go anyway.) ID: 32895 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 32901 - Posted: 25 Oct 2017, 5:18:28 UTC It has happened also on a second Ubuntu machine that I just set up for LHC. If I include ATLAS in the selection list, that takes over. I have to exclude it to get the other work. My impression of the BOINC scheduler is that it does not exercise such a fine degree of control, but that the LHC project selects the particular jobs run. At least that is the way it is done on WCG with their various projects. But LHC is a much more complicated system, and only the experts know how it works, not I. ID: 32901 · Reply Quote

marmot Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0	Message 32940 - Posted: 31 Oct 2017, 7:15:54 UTC Tried to run a mix of ATLAS, Theory, CMS and LHCb on 32GB with 32 threads and LHC@Home sends down ~18-24 ATLAS and nothing else, then BOINC starts the 4 from the app_config, then leaves 28 threads idle as there are no Theory/CMS/LHCb to run. Asking for more work gets the "queue full, no tasks sent" error. Solution 0: Set up a HOME preference for ATLAS only jobs and setup WORK preference for other jobs. Solution 0a: Assign the computer to HOME preferences then download work load of ATLAS VM. Delete half the work units downloaded. Then move the computer to WORK preferences and update LHC@Home in BOINC and it will then d/l the other half the work queue of other WU's. This, irritatingly, means micro-managing the work unit queue daily. Solution 0b: Install BOINC in two folders and run two copies of BOINC. Give one installation the WORK preferences for non-ATLAS and the other installation the HOME preferences for ATLAS and make sure to manage the app_configs to fit the RAM limitations. Leave enough RAM for OS and support apps or end up in a swap file thrashing and eventual pushing the reset/off button for hard restart! ID: 32940 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2278 Credit: 178,775,457 RAC: 31	Message 32945 - Posted: 31 Oct 2017, 7:52:56 UTC Why this way to use LHC Marmot. Home, School and Work is a good solution, but it is useful to run only one of them at one Computer. It's better to run ATLAS alone on the PC. ID: 32945 · Reply Quote

marmot Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0	Message 32962 - Posted: 2 Nov 2017, 0:04:53 UTC - in response to Message 32945. Why this way to use LHC Marmot. Home, School and Work is a good solution, but it is useful to run only one of them at one Computer. It's better to run ATLAS alone on the PC. No other projects of the 35 I've tried force a user to limit their machine to a single version of the projects' work units. All default to every work unit except test units. Using only ATLAS WU's leads to too much time where the CPU's lie idle in the startup phase since RAM size limits number of WU's. Idle time is lost work, so Theory needs to be running at the same time to assure me that there is NEVER a wasted CPU cycle. Also, not every computer will have the RAM to run more than 1 ATLAS and so the other cores need to run Theory, Six-track or some other BOINC project or we just aren't getting the best use of our BOINC machines. If people want to run multiple versions of work from LHC@Home at a time, why prevent or discourage it? ID: 32962 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2278 Credit: 178,775,457 RAC: 31	Message 32965 - Posted: 2 Nov 2017, 7:16:32 UTC Every Project is special. ATLAS is now a Linux native APP project with high priority therefore. You can see at the Moment a lot of work running in this app. There is a whitepaper from David Cameron to reduce the waste time. You can see this link in one thread. For Windows you need to find a way for your Computers to let them projects running as it is possible. ID: 32965 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 32967 - Posted: 2 Nov 2017, 18:16:37 UTC I am running only Atlas tasks (not native) on my 2 Linux boxes, with SuSE Leap 42.2 and 42.3. I am running SETI@home, SETI Beta and Einstein@home, both CPU and GPU on my fastest machine, a Windows 10 PC with a GTX 1050 Ti graphic board. Tullio ID: 32967 · Reply Quote

marmot Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0	Message 33023 - Posted: 8 Nov 2017, 22:06:01 UTC - in response to Message 32965. Every Project is special. ATLAS is now a Linux native APP project Here is some of the semantic confusion when discussing things on this particular forum. LHC@Home is a BOINC project ATLAS is a LHC@Home Work Unit, it is NOT a BOINC project any longer. Once ATLAS was merged from it's own BOINC server it became a work unit within the LHC@Home project. Same for Theory and CMS. They are work units and not BOINC projects. Of course they are projects in the sense of funding and work staff but that confuses the discussion when trying to troubleshoot issues on the LHC@Home BOINC project 'number crunching' forums. ID: 33023 · Reply Quote

marmot Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0	Message 33025 - Posted: 8 Nov 2017, 22:11:11 UTC - in response to Message 32967. I am running only Atlas tasks (not native) on my 2 Linux boxes, with SuSE Leap 42.2 and 42.3. I am running SETI@home, SETI Beta and Einstein@home, both CPU and GPU on my fastest machine, a Windows 10 PC with a GTX 1050 Ti graphic board. Tullio If there were an option to run a 32 core ATLAS then my servers would run ATLAS only but the maximum is 24 core and that was getting odd crashes on the test machine. Not enough RAM (not spending $100's more on RAM for just this project, other projects are fine) for 2x 16 core ATLAS so the machines would have to run a mix of ATLAS and Theory but the server won't distribute that properly. And back to the initial solution I posted above if you want to run a mix of Theory, CMS, LHCb and ATLAS. Multiple BOINC installs. ID: 33025 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 33027 - Posted: 9 Nov 2017, 8:41:29 UTC - in response to Message 33025. Last modified: 9 Nov 2017, 8:43:04 UTC My Windows 10 PC has 22 GB RAM and runs two core Atlas tasks. My Linux boxes have both 8 GB RAM and run one core Atlas tasks, and SixTrack tasks when available. MY Windows PC connects to Condor but Condor exits without doing any job. Why? Tullio ID: 33027 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2278 Credit: 178,775,457 RAC: 31	Message 33028 - Posted: 9 Nov 2017, 9:50:54 UTC Sometime, Condor have a problem to connect to our PC's and the task ended with Errors. It can also be, that there are no tasks inside the Project (CMS, LHCb, Theory). ATLAS run without Condor! ID: 33028 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 407 Credit: 238,712 RAC: 0	Message 33030 - Posted: 9 Nov 2017, 16:04:11 UTC - in response to Message 33027. My Windows 10 PC has 22 GB RAM and runs two core Atlas tasks. My Linux boxes have both 8 GB RAM and run one core Atlas tasks, and SixTrack tasks when available. MY Windows PC connects to Condor but Condor exits without doing any job. Why? Tullio There is a connection problem but it is not clear what it is as you can ping the server. ID: 33030 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 407 Credit: 238,712 RAC: 0	Message 33032 - Posted: 9 Nov 2017, 16:30:58 UTC - in response to Message 33030. Last modified: 9 Nov 2017, 16:31:07 UTC Can you connect to vccondorce02.cern.ch on port 9618 from your Windows machine? If you open the following link in your browser you should get a connection reset error http://vccondorce02.cern.ch:9618 ID: 33032 · Reply Quote

tullio Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0	Message 33034 - Posted: 9 Nov 2017, 18:29:02 UTC - in response to Message 33032. Can you connect to vccondorce02.cern.ch on port 9618 from your Windows machine? If you open the following link in your browser you should get a connection reset error http://vccondorce02.cern.ch:9618 What I get on the Windows PC is Job finished in slot1 with unknown exit code-LHC@home-CERN ID: 33034 · Reply Quote

marmot Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0	Message 33049 - Posted: 12 Nov 2017, 2:03:25 UTC - in response to Message 33032. Last modified: 12 Nov 2017, 2:04:19 UTC Can you connect to vccondorce02.cern.ch on port 9618 from your Windows machine? If you open the following link in your browser you should get a connection reset error http://vccondorce02.cern.ch:9618 In the router QoS feature, I set VCCondor IP address (128.142.142.167:9618 maybe there is another IP but it's not in my internet session list) TCP, connecting to any of the BOINC machines internal IP's, to lowest priority so that any other initial handshaking packets get priority over the other (30-90) WU's currently communicating to 128.142.142.167:9618. Also BOINC and other traffic gets priority. That was 3 hours ago and all the errors have stopped, so far, but the real test will be one of my servers cold starting up 32 Theory WU's. That always ends in a communication disaster and the newest machine errored out 97 or more Theory WU's before it finally reached equilibrium. It would be nice if the retry count or polling period for the initial handshaking communication with Condor was a bit higher/longer for cold starting a server.... ID: 33049 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1930 Credit: 154,015,697 RAC: 115,941	Message 33051 - Posted: 12 Nov 2017, 6:53:47 UTC in a recent posting in another thread, it became clear that there seems to be some kind of problem with the Condor server; however, so far it could not be found out what exactly it is (that's at least what I understand). ID: 33051 · Reply Quote

LHC@home