Message boards : Number crunching : I am sent just ATLAS tasks
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
jimwilkins

Send message
Joined: 14 Oct 11
Posts: 2
Credit: 941,643
RAC: 0
Message 32888 - Posted: 23 Oct 2017, 21:09:06 UTC

Hi,

Although I have all the experiments checked, I am only getting ATLAS tasks. That's OK with me, but is that a potential issue with the scheduler?

Jim
ID: 32888 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 32889 - Posted: 23 Oct 2017, 21:13:30 UTC - in response to Message 32888.  

That is the same with me on one machine where I have all the tasks selected. One another machine, I do not have ATLAS selected and am getting CMS, LHCb and Theory just fine. So I think they are just putting a higher priority on ATLAS now, probably in preparation for the upgrade of their accelerator.
ID: 32889 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 997
Credit: 6,264,307
RAC: 71
Message 32890 - Posted: 23 Oct 2017, 21:27:55 UTC - in response to Message 32889.  

That is the same with me on one machine where I have all the tasks selected. One another machine, I do not have ATLAS selected and am getting CMS, LHCb and Theory just fine. So I think they are just putting a higher priority on ATLAS now, probably in preparation for the upgrade of their accelerator.

I'm not aware of any priority being placed on any one application. However, BOINC priorities are arcane and often counterintuitive. :-(
BTW, which upgrade are you talking about? The one for ~2019 or the one for ~2027? I'm heavily involved in the design studies for the second, despite the fact that experts tell me I'll be blind by then -- and almost certainly retired or dead!
ID: 32890 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 32891 - Posted: 23 Oct 2017, 22:29:12 UTC - in response to Message 32890.  

BTW, which upgrade are you talking about? The one for ~2019 or the one for ~2027? I'm heavily involved in the design studies for the second, despite the fact that experts tell me I'll be blind by then -- and almost certainly retired or dead!

It must be the one for 2019, though I remembered it as being in 2018. I assumed it was the one they were given so much work to SixTrack for, but have not seen that actually stated yet.

(If you are not here, the one for 2027 won't go anyway.)
ID: 32891 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 997
Credit: 6,264,307
RAC: 71
Message 32895 - Posted: 24 Oct 2017, 8:01:06 UTC - in response to Message 32891.  

BTW, which upgrade are you talking about? The one for ~2019 or the one for ~2027? I'm heavily involved in the design studies for the second, despite the fact that experts tell me I'll be blind by then -- and almost certainly retired or dead!

It must be the one for 2019, though I remembered it as being in 2018. I assumed it was the one they were given so much work to SixTrack for, but have not seen that actually stated yet.

I think it was supposed to be 2018, but the schedule has slipped by a year.

(If you are not here, the one for 2027 won't go anyway.)

ID: 32895 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 32901 - Posted: 25 Oct 2017, 5:18:28 UTC

It has happened also on a second Ubuntu machine that I just set up for LHC. If I include ATLAS in the selection list, that takes over. I have to exclude it to get the other work. My impression of the BOINC scheduler is that it does not exercise such a fine degree of control, but that the LHC project selects the particular jobs run. At least that is the way it is done on WCG with their various projects. But LHC is a much more complicated system, and only the experts know how it works, not I.
ID: 32901 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 144
Credit: 6,301,268
RAC: 0
Message 32940 - Posted: 31 Oct 2017, 7:15:54 UTC

Tried to run a mix of ATLAS, Theory, CMS and LHCb on 32GB with 32 threads and LHC@Home sends down ~18-24 ATLAS and nothing else, then BOINC starts the 4 from the app_config, then leaves 28 threads idle as there are no Theory/CMS/LHCb to run. Asking for more work gets the "queue full, no tasks sent" error.

Solution 0: Set up a HOME preference for ATLAS only jobs and setup WORK preference for other jobs.

Solution 0a: Assign the computer to HOME preferences then download work load of ATLAS VM. Delete half the work units downloaded. Then move the computer to WORK preferences and update LHC@Home in BOINC and it will then d/l the other half the work queue of other WU's. This, irritatingly, means micro-managing the work unit queue daily.

Solution 0b:
Install BOINC in two folders and run two copies of BOINC. Give one installation the WORK preferences for non-ATLAS and the other installation the HOME preferences for ATLAS and make sure to manage the app_configs to fit the RAM limitations. Leave enough RAM for OS and support apps or end up in a swap file thrashing and eventual pushing the reset/off button for hard restart!
ID: 32940 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,085,087
RAC: 104,468
Message 32945 - Posted: 31 Oct 2017, 7:52:56 UTC

Why this way to use LHC Marmot.

Home, School and Work is a good solution,
but it is useful to run only one of them at one Computer.
It's better to run ATLAS alone on the PC.
ID: 32945 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 144
Credit: 6,301,268
RAC: 0
Message 32962 - Posted: 2 Nov 2017, 0:04:53 UTC - in response to Message 32945.  

Why this way to use LHC Marmot.

Home, School and Work is a good solution,
but it is useful to run only one of them at one Computer.
It's better to run ATLAS alone on the PC.



No other projects of the 35 I've tried force a user to limit their machine to a single version of the projects' work units. All default to every work unit except test units.

Using only ATLAS WU's leads to too much time where the CPU's lie idle in the startup phase since RAM size limits number of WU's. Idle time is lost work, so Theory needs to be running at the same time to assure me that there is NEVER a wasted CPU cycle.

Also, not every computer will have the RAM to run more than 1 ATLAS and so the other cores need to run Theory, Six-track or some other BOINC project or we just aren't getting the best use of our BOINC machines.

If people want to run multiple versions of work from LHC@Home at a time, why prevent or discourage it?
ID: 32962 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,085,087
RAC: 104,468
Message 32965 - Posted: 2 Nov 2017, 7:16:32 UTC

Every Project is special.
ATLAS is now a Linux native APP project with high priority therefore.

You can see at the Moment a lot of work running in this app.

There is a whitepaper from David Cameron to reduce the waste time.
You can see this link in one thread.

For Windows you need to find a way for your Computers to let them projects running as it is possible.
ID: 32965 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 32967 - Posted: 2 Nov 2017, 18:16:37 UTC

I am running only Atlas tasks (not native) on my 2 Linux boxes, with SuSE Leap 42.2 and 42.3. I am running SETI@home, SETI Beta and Einstein@home, both CPU and GPU on my fastest machine, a Windows 10 PC with a GTX 1050 Ti graphic board.
Tullio
ID: 32967 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 144
Credit: 6,301,268
RAC: 0
Message 33023 - Posted: 8 Nov 2017, 22:06:01 UTC - in response to Message 32965.  

Every Project is special.
ATLAS is now a Linux native APP project


Here is some of the semantic confusion when discussing things on this particular forum.

LHC@Home is a BOINC project

ATLAS is a LHC@Home Work Unit, it is NOT a BOINC project any longer. Once ATLAS was merged from it's own BOINC server it became a work unit within the LHC@Home project.

Same for Theory and CMS. They are work units and not BOINC projects.

Of course they are projects in the sense of funding and work staff but that confuses the discussion when trying to troubleshoot issues on the LHC@Home BOINC project 'number crunching' forums.
ID: 33023 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 144
Credit: 6,301,268
RAC: 0
Message 33025 - Posted: 8 Nov 2017, 22:11:11 UTC - in response to Message 32967.  

I am running only Atlas tasks (not native) on my 2 Linux boxes, with SuSE Leap 42.2 and 42.3. I am running SETI@home, SETI Beta and Einstein@home, both CPU and GPU on my fastest machine, a Windows 10 PC with a GTX 1050 Ti graphic board.
Tullio


If there were an option to run a 32 core ATLAS then my servers would run ATLAS only but the maximum is 24 core and that was getting odd crashes on the test machine.
Not enough RAM (not spending $100's more on RAM for just this project, other projects are fine) for 2x 16 core ATLAS so the machines would have to run a mix of ATLAS and Theory but the server won't distribute that properly. And back to the initial solution I posted above if you want to run a mix of Theory, CMS, LHCb and ATLAS.
Multiple BOINC installs.
ID: 33025 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 33027 - Posted: 9 Nov 2017, 8:41:29 UTC - in response to Message 33025.  
Last modified: 9 Nov 2017, 8:43:04 UTC

My Windows 10 PC has 22 GB RAM and runs two core Atlas tasks. My Linux boxes have both 8 GB RAM and run one core Atlas tasks, and SixTrack tasks when available. MY Windows PC connects to Condor but Condor exits without doing any job. Why?
Tullio
ID: 33027 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,085,087
RAC: 104,468
Message 33028 - Posted: 9 Nov 2017, 9:50:54 UTC

Sometime, Condor have a problem to connect to our PC's and the task ended with Errors.
It can also be, that there are no tasks inside the Project (CMS, LHCb, Theory).
ATLAS run without Condor!
ID: 33028 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 33030 - Posted: 9 Nov 2017, 16:04:11 UTC - in response to Message 33027.  

My Windows 10 PC has 22 GB RAM and runs two core Atlas tasks. My Linux boxes have both 8 GB RAM and run one core Atlas tasks, and SixTrack tasks when available. MY Windows PC connects to Condor but Condor exits without doing any job. Why?
Tullio


There is a connection problem but it is not clear what it is as you can ping the server.
ID: 33030 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 372
Credit: 238,712
RAC: 0
Message 33032 - Posted: 9 Nov 2017, 16:30:58 UTC - in response to Message 33030.  
Last modified: 9 Nov 2017, 16:31:07 UTC

Can you connect to vccondorce02.cern.ch on port 9618 from your Windows machine? If you open the following link in your browser you should get a connection reset error

http://vccondorce02.cern.ch:9618
ID: 33032 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 33034 - Posted: 9 Nov 2017, 18:29:02 UTC - in response to Message 33032.  

Can you connect to vccondorce02.cern.ch on port 9618 from your Windows machine? If you open the following link in your browser you should get a connection reset error

http://vccondorce02.cern.ch:9618

What I get on the Windows PC is
Job finished in slot1 with unknown exit code-LHC@home-CERN
ID: 33034 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 144
Credit: 6,301,268
RAC: 0
Message 33049 - Posted: 12 Nov 2017, 2:03:25 UTC - in response to Message 33032.  
Last modified: 12 Nov 2017, 2:04:19 UTC

Can you connect to vccondorce02.cern.ch on port 9618 from your Windows machine? If you open the following link in your browser you should get a connection reset error

http://vccondorce02.cern.ch:9618


In the router QoS feature, I set VCCondor IP address (128.142.142.167:9618 maybe there is another IP but it's not in my internet session list) TCP, connecting to any of the BOINC machines internal IP's, to lowest priority so that any other initial handshaking packets get priority over the other (30-90) WU's currently communicating to 128.142.142.167:9618. Also BOINC and other traffic gets priority.

That was 3 hours ago and all the errors have stopped, so far, but the real test will be one of my servers cold starting up 32 Theory WU's. That always ends in a communication disaster and the newest machine errored out 97 or more Theory WU's before it finally reached equilibrium.

It would be nice if the retry count or polling period for the initial handshaking communication with Condor was a bit higher/longer for cold starting a server....
ID: 33049 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,342,948
RAC: 101,785
Message 33051 - Posted: 12 Nov 2017, 6:53:47 UTC

in a recent posting in another thread, it became clear that there seems to be some kind of problem with the Condor server; however, so far it could not be found out what exactly it is (that's at least what I understand).
ID: 33051 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : I am sent just ATLAS tasks


©2024 CERN