Message boards :
Number crunching :
I am sent just ATLAS tasks
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Oct 11 Posts: 2 Credit: 941,643 RAC: 0 |
Hi, Although I have all the experiments checked, I am only getting ATLAS tasks. That's OK with me, but is that a potential issue with the scheduler? Jim |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
That is the same with me on one machine where I have all the tasks selected. One another machine, I do not have ATLAS selected and am getting CMS, LHCb and Theory just fine. So I think they are just putting a higher priority on ATLAS now, probably in preparation for the upgrade of their accelerator. |
Send message Joined: 29 Aug 05 Posts: 998 Credit: 6,264,307 RAC: 71 |
That is the same with me on one machine where I have all the tasks selected. One another machine, I do not have ATLAS selected and am getting CMS, LHCb and Theory just fine. So I think they are just putting a higher priority on ATLAS now, probably in preparation for the upgrade of their accelerator. I'm not aware of any priority being placed on any one application. However, BOINC priorities are arcane and often counterintuitive. :-( BTW, which upgrade are you talking about? The one for ~2019 or the one for ~2027? I'm heavily involved in the design studies for the second, despite the fact that experts tell me I'll be blind by then -- and almost certainly retired or dead! |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
BTW, which upgrade are you talking about? The one for ~2019 or the one for ~2027? I'm heavily involved in the design studies for the second, despite the fact that experts tell me I'll be blind by then -- and almost certainly retired or dead! It must be the one for 2019, though I remembered it as being in 2018. I assumed it was the one they were given so much work to SixTrack for, but have not seen that actually stated yet. (If you are not here, the one for 2027 won't go anyway.) |
Send message Joined: 29 Aug 05 Posts: 998 Credit: 6,264,307 RAC: 71 |
BTW, which upgrade are you talking about? The one for ~2019 or the one for ~2027? I'm heavily involved in the design studies for the second, despite the fact that experts tell me I'll be blind by then -- and almost certainly retired or dead! I think it was supposed to be 2018, but the schedule has slipped by a year. (If you are not here, the one for 2027 won't go anyway.) |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
It has happened also on a second Ubuntu machine that I just set up for LHC. If I include ATLAS in the selection list, that takes over. I have to exclude it to get the other work. My impression of the BOINC scheduler is that it does not exercise such a fine degree of control, but that the LHC project selects the particular jobs run. At least that is the way it is done on WCG with their various projects. But LHC is a much more complicated system, and only the experts know how it works, not I. |
Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 |
Tried to run a mix of ATLAS, Theory, CMS and LHCb on 32GB with 32 threads and LHC@Home sends down ~18-24 ATLAS and nothing else, then BOINC starts the 4 from the app_config, then leaves 28 threads idle as there are no Theory/CMS/LHCb to run. Asking for more work gets the "queue full, no tasks sent" error. Solution 0: Set up a HOME preference for ATLAS only jobs and setup WORK preference for other jobs. Solution 0a: Assign the computer to HOME preferences then download work load of ATLAS VM. Delete half the work units downloaded. Then move the computer to WORK preferences and update LHC@Home in BOINC and it will then d/l the other half the work queue of other WU's. This, irritatingly, means micro-managing the work unit queue daily. Solution 0b: Install BOINC in two folders and run two copies of BOINC. Give one installation the WORK preferences for non-ATLAS and the other installation the HOME preferences for ATLAS and make sure to manage the app_configs to fit the RAM limitations. Leave enough RAM for OS and support apps or end up in a swap file thrashing and eventual pushing the reset/off button for hard restart! |
Send message Joined: 2 May 07 Posts: 2071 Credit: 156,150,336 RAC: 105,728 |
Why this way to use LHC Marmot. Home, School and Work is a good solution, but it is useful to run only one of them at one Computer. It's better to run ATLAS alone on the PC. |
Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 |
Why this way to use LHC Marmot. No other projects of the 35 I've tried force a user to limit their machine to a single version of the projects' work units. All default to every work unit except test units. Using only ATLAS WU's leads to too much time where the CPU's lie idle in the startup phase since RAM size limits number of WU's. Idle time is lost work, so Theory needs to be running at the same time to assure me that there is NEVER a wasted CPU cycle. Also, not every computer will have the RAM to run more than 1 ATLAS and so the other cores need to run Theory, Six-track or some other BOINC project or we just aren't getting the best use of our BOINC machines. If people want to run multiple versions of work from LHC@Home at a time, why prevent or discourage it? |
Send message Joined: 2 May 07 Posts: 2071 Credit: 156,150,336 RAC: 105,728 |
Every Project is special. ATLAS is now a Linux native APP project with high priority therefore. You can see at the Moment a lot of work running in this app. There is a whitepaper from David Cameron to reduce the waste time. You can see this link in one thread. For Windows you need to find a way for your Computers to let them projects running as it is possible. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I am running only Atlas tasks (not native) on my 2 Linux boxes, with SuSE Leap 42.2 and 42.3. I am running SETI@home, SETI Beta and Einstein@home, both CPU and GPU on my fastest machine, a Windows 10 PC with a GTX 1050 Ti graphic board. Tullio |
Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 |
Every Project is special. Here is some of the semantic confusion when discussing things on this particular forum. LHC@Home is a BOINC project ATLAS is a LHC@Home Work Unit, it is NOT a BOINC project any longer. Once ATLAS was merged from it's own BOINC server it became a work unit within the LHC@Home project. Same for Theory and CMS. They are work units and not BOINC projects. Of course they are projects in the sense of funding and work staff but that confuses the discussion when trying to troubleshoot issues on the LHC@Home BOINC project 'number crunching' forums. |
Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 |
I am running only Atlas tasks (not native) on my 2 Linux boxes, with SuSE Leap 42.2 and 42.3. I am running SETI@home, SETI Beta and Einstein@home, both CPU and GPU on my fastest machine, a Windows 10 PC with a GTX 1050 Ti graphic board. If there were an option to run a 32 core ATLAS then my servers would run ATLAS only but the maximum is 24 core and that was getting odd crashes on the test machine. Not enough RAM (not spending $100's more on RAM for just this project, other projects are fine) for 2x 16 core ATLAS so the machines would have to run a mix of ATLAS and Theory but the server won't distribute that properly. And back to the initial solution I posted above if you want to run a mix of Theory, CMS, LHCb and ATLAS. Multiple BOINC installs. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
My Windows 10 PC has 22 GB RAM and runs two core Atlas tasks. My Linux boxes have both 8 GB RAM and run one core Atlas tasks, and SixTrack tasks when available. MY Windows PC connects to Condor but Condor exits without doing any job. Why? Tullio |
Send message Joined: 2 May 07 Posts: 2071 Credit: 156,150,336 RAC: 105,728 |
Sometime, Condor have a problem to connect to our PC's and the task ended with Errors. It can also be, that there are no tasks inside the Project (CMS, LHCb, Theory). ATLAS run without Condor! |
Send message Joined: 20 Jun 14 Posts: 372 Credit: 238,712 RAC: 0 |
My Windows 10 PC has 22 GB RAM and runs two core Atlas tasks. My Linux boxes have both 8 GB RAM and run one core Atlas tasks, and SixTrack tasks when available. MY Windows PC connects to Condor but Condor exits without doing any job. Why? There is a connection problem but it is not clear what it is as you can ping the server. |
Send message Joined: 20 Jun 14 Posts: 372 Credit: 238,712 RAC: 0 |
Can you connect to vccondorce02.cern.ch on port 9618 from your Windows machine? If you open the following link in your browser you should get a connection reset error http://vccondorce02.cern.ch:9618 |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Can you connect to vccondorce02.cern.ch on port 9618 from your Windows machine? If you open the following link in your browser you should get a connection reset error What I get on the Windows PC is Job finished in slot1 with unknown exit code-LHC@home-CERN |
Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 |
Can you connect to vccondorce02.cern.ch on port 9618 from your Windows machine? If you open the following link in your browser you should get a connection reset error In the router QoS feature, I set VCCondor IP address (128.142.142.167:9618 maybe there is another IP but it's not in my internet session list) TCP, connecting to any of the BOINC machines internal IP's, to lowest priority so that any other initial handshaking packets get priority over the other (30-90) WU's currently communicating to 128.142.142.167:9618. Also BOINC and other traffic gets priority. That was 3 hours ago and all the errors have stopped, so far, but the real test will be one of my servers cold starting up 32 Theory WU's. That always ends in a communication disaster and the newest machine errored out 97 or more Theory WU's before it finally reached equilibrium. It would be nice if the retry count or polling period for the initial handshaking communication with Condor was a bit higher/longer for cold starting a server.... |
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,400,922 RAC: 102,318 |
in a recent posting in another thread, it became clear that there seems to be some kind of problem with the Condor server; however, so far it could not be found out what exactly it is (that's at least what I understand). |
©2024 CERN