Message boards : ATLAS application : 2-core tasks with process "athena.py" running 4 times
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 120
Credit: 8,342,299
RAC: 4,382
Message 32501 - Posted: 22 Sep 2017, 16:09:16 UTC

Hi,
I have seen a few times some ATLAS tasks where athena.py is running 4 times inside the VM instead of 2 times. I have configured both "LHC Preferences" and "app_config.xml" to use only 2 cores, but from time to time I see tasks where:
- "Alt-F2" in the console shows "Event nr. 1" 4 times, and the same for all other events.
- "Alt-F3" in the console shows 4 "athena.py" processes, each running at 50% (normal since the VM only has 2 cores allocated to it).
The result is a task that takes twice the time to finish.
Is that a known issue? If yes, what can be done to avoid it?
If no, how can I help to further investigate it?
Regards,
Herve
We are the product of random evolution.
ID: 32501 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 120
Credit: 8,342,299
RAC: 4,382
Message 32503 - Posted: 23 Sep 2017, 3:58:37 UTC

ID: 32503 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 120
Credit: 8,342,299
RAC: 4,382
Message 32511 - Posted: 23 Sep 2017, 19:13:19 UTC

This task is supposed to run with 3-cores only, but actually has "athena.py" running 6 times within the VM:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=157814626
We are the product of random evolution.
ID: 32511 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 120
Credit: 8,342,299
RAC: 4,382
Message 32513 - Posted: 25 Sep 2017, 1:34:38 UTC - in response to Message 32511.  

Another 3-cores task with "athena.py" running 6 times within the VM:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=157867841
And this 3-cores task has "athena.py" running 9 times within the VM:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=157867240
We are the product of random evolution.
ID: 32513 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 183
Credit: 4,459,939
RAC: 9,649
Message 32517 - Posted: 25 Sep 2017, 13:04:00 UTC - in response to Message 32513.  

These tasks all have duplicated (or triplicated) log messages so I wonder if the task is running twice (or three times) in parallel inside the same VM. I never got to the bottom of why we sometimes get these duplicate log messages.

The line

Guest Log: ATHENA_PROC_NUMBER=3

shows what is passed to the task to set the number of cores used.
ID: 32517 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 120
Credit: 8,342,299
RAC: 4,382
Message 32522 - Posted: 26 Sep 2017, 2:39:24 UTC

Thanks David,
I think I have had this issue for quite some time, but just recently linked it to the duplicated or triplicated athena.py processes.
Are the results of the task valid for you guys when this is happening ?

I just check one of my computers. Out of 6 running tasks: 3 have triplicated log messages, 2 have duplicated log messages and one is OK. So right now this computer is spending the CPU time of 14 ATLAS tasks to actually run only 6 tasks. It would be great of computer's crunching capacity could be better used.

Is there anything I can do to investigate?
We are the product of random evolution.
ID: 32522 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 15
Credit: 118,839
RAC: 2
Message 32530 - Posted: 26 Sep 2017, 21:28:55 UTC

Another possible odd thing in the logs is the memory assigned to the VM. It looks like 9Gb is assigned for a 3 processor work unit. My 4 processor work units end up with 6,2Gb for the virtual machine. This follows the 2,6Gb + (0,9Gb * # processors)
ID: 32530 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 120
Credit: 8,342,299
RAC: 4,382
Message 32541 - Posted: 27 Sep 2017, 18:02:59 UTC - in response to Message 32530.  

Another possible odd thing in the logs is the memory assigned to the VM. It looks like 9Gb is assigned for a 3 processor work unit. My 4 processor work units end up with 6,2Gb for the virtual machine. This follows the 2,6Gb + (0,9Gb * # processors)

That configuration is intentional because I had recently a few tasks that did not go through the starting phase with 7 Gbytes.
Could it be that because I give 9GB ATLAS process ends up running multiple times because it finds a lot of memory available?
We are the product of random evolution.
ID: 32541 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 278
Credit: 9,384,751
RAC: 13,921
Message 32542 - Posted: 27 Sep 2017, 20:05:48 UTC
Last modified: 27 Sep 2017, 20:06:54 UTC

I have a three CPU core task running and on TOP I see every once in a while a fourth athena.py running. It uses only a very little CPU time (< 2%) and it don't seem to run any jobs as job numbers appear only three times on Alt+F2 display. I have also given "a little extra memory" for the task (5400 MB) via app_config.xml.
ID: 32542 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 15
Credit: 118,839
RAC: 2
Message 32548 - Posted: 27 Sep 2017, 23:02:00 UTC - in response to Message 32542.  

I think you would both be fine going back to the stock setting for the ATLAS tasks. It looks like the VM memory was bumped up from 0,8 to 0,9 per core in the calculation.
ID: 32548 · Report as offensive     Reply Quote

Message boards : ATLAS application : 2-core tasks with process "athena.py" running 4 times


©2018 CERN