Message boards : ATLAS application : Console monitoring
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
gyllic

Send message
Joined: 9 Dec 14
Posts: 198
Credit: 2,442,533
RAC: 1,234
Message 29613 - Posted: 25 Mar 2017, 10:33:36 UTC - in response to Message 29612.  
Last modified: 25 Mar 2017, 10:35:00 UTC

not always does the console seem to work:

I have currently a task running (1-core) for which I opened the console a few times since it startet some 36 hours ago (BOINC shows a status of about 88%). Every time the console showed what it was supposed to show.
Now, suddenly, the console is only black, all over. Not a single figure or letter.
What does this mean? Is the task broken? Is the console broken? Is the VM broken?

try to klick into the console, hit enter once and maybe you see some output. but not sure if this works.
ID: 29613 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 395
Credit: 78,858,844
RAC: 102,255
Message 29617 - Posted: 25 Mar 2017, 11:09:48 UTC - in response to Message 29612.  

Erich65 wrote:
What does this mean? Is the task broken? Is the console broken? Is the VM broken?

Go to my Checklist V3 and check Number 16 Scenario E


Supporting BOINC, a great concept !
ID: 29617 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1079
Credit: 18,191,082
RAC: 49,883
Message 29628 - Posted: 25 Mar 2017, 15:24:57 UTC - in response to Message 29617.  

Thanks, Yeti, for your advice.

However, when I came back home lateron, I saw that the task had finished properly. So maybe something was wrong only with the console GUI, or whatever.
ID: 29628 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 395
Credit: 78,858,844
RAC: 102,255
Message 29662 - Posted: 26 Mar 2017, 21:35:07 UTC

I'm running SingleCore, but each event nr exists several times ! ?




Supporting BOINC, a great concept !
ID: 29662 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 251
Credit: 8,229,477
RAC: 8,707
Message 29665 - Posted: 27 Mar 2017, 7:32:03 UTC - in response to Message 29662.  

The output is added to the previous output so that's why you see repetition (notice that the timestamps are the same). I will try to flush the screen each time before printing the output.
ID: 29665 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 1060
Credit: 46,031,261
RAC: 136,051
Message 29697 - Posted: 28 Mar 2017, 12:19:20 UTC

@David Cameron

The console output works and is much better than to have nothing.

Can you add the total number of WU events?
Perhaps like:
... Event nr. 5/100 took ...
ID: 29697 · Report as offensive     Reply Quote
MPI für Physik

Send message
Joined: 20 Mar 15
Posts: 2
Credit: 143,597,659
RAC: 45,265
Message 29709 - Posted: 29 Mar 2017, 15:34:06 UTC

It seems like that the new information output produces also a lot of mails.
Every time when a event is processed you are doing some grep on the events, but the location is wrong, so the postmaster is sending everytime a mail.


Subject: Cron <root@localhost> grep -h "Event nr" /home/atlas01/RunAtlas/Panda_Pilot_*/PandaJob_*/athenaMP-workers-EVNTtoHITS-sim/worker_*/AthenaMP.log|sort > /dev/tty2
grep: /home/atlas01/RunAtlas/Panda_Pilot_*/PandaJob_*/athenaMP-workers-EVNTtoHITS-sim/worker_*/AthenaMP.log: No such file or directory

It would be great if that could be fixed!
ID: 29709 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 251
Credit: 8,229,477
RAC: 8,707
Message 29721 - Posted: 30 Mar 2017, 8:59:56 UTC - in response to Message 29709.  

This is fixed now (the fix will be propagated to new tasks in a few hours).

The errors should only happen at the start of the task before the log is started, did you see it during the whole task?
ID: 29721 · Report as offensive     Reply Quote
MPI für Physik

Send message
Joined: 20 Mar 15
Posts: 2
Credit: 143,597,659
RAC: 45,265
Message 29723 - Posted: 30 Mar 2017, 12:29:14 UTC - in response to Message 29721.  

Thank you David!

Unfortunately i saw it during the whole task, on each PC, so there where a huge amount of E-Mails.
I will check if it is fine now.
ID: 29723 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 357
Credit: 17,464,923
RAC: 31,398
Message 29724 - Posted: 30 Mar 2017, 14:28:29 UTC

The TOP (Alt+F3) does not work anymore. It stopped working when I changed to running with a single core.
ID: 29724 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 395
Credit: 78,858,844
RAC: 102,255
Message 29725 - Posted: 30 Mar 2017, 14:44:01 UTC - in response to Message 29724.  

The TOP (Alt+F3) does not work anymore. It stopped working when I changed to running with a single core.

Are you shure ? TOP wasn't available on Atlas until now


Supporting BOINC, a great concept !
ID: 29725 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 357
Credit: 17,464,923
RAC: 31,398
Message 29729 - Posted: 30 Mar 2017, 17:38:45 UTC - in response to Message 29725.  

Yes, it was working about a week ago when it was announced but not at the moment. With Alt+F3 I now get the same screen as with Alt+F1
ID: 29729 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 251
Credit: 8,229,477
RAC: 8,707
Message 29731 - Posted: 30 Mar 2017, 19:49:14 UTC - in response to Message 29729.  

Well it was kind of half-working a week ago, but my latest attempts to make it work fully stopped it working completely.

The problem seems to be running a persistent command with sudo (root permission is needed to write to the console) inside a script run as a normal user. It works for other LHC projects because as I understand they run bootstrap scripts as root. I will keep trying to find a way to make it work.
ID: 29731 · Report as offensive     Reply Quote
Timo425

Send message
Joined: 28 Sep 17
Posts: 4
Credit: 451,660
RAC: 0
Message 33052 - Posted: 12 Nov 2017, 13:30:08 UTC
Last modified: 12 Nov 2017, 13:33:18 UTC

So I'm trying to figure out why my LHC tasks in linux seem to run so slow.
Atlas simulation 1.01 (vbox64_mt_mcore_atlas) has been running for almost 13 hours now and is currently at 99.994%. However it is progressing very slowly at this point.
alt+f2 does nothing in the vm console, only alt+f1 and alt+f6 switch to some other information. I'm using Ubuntu, I have 16gb RAM and i5-6500 as cpu. My cpu usage is currently set at 99%, however 3 cores are used by LHC and 1 core is used by 3 Einstein WUs at the same time. Also LHC priority is half of normal.
Task manager shows 5gb of 15.6gb used and 1 core is running at 100% while other three are usually slacking around 10-25%.
ID: 33052 · Report as offensive     Reply Quote
Timo425

Send message
Joined: 28 Sep 17
Posts: 4
Credit: 451,660
RAC: 0
Message 33057 - Posted: 13 Nov 2017, 10:44:16 UTC
Last modified: 13 Nov 2017, 10:49:34 UTC

It seems I can't edit my own previous post?
Anyway I cancelled my previous task and started a new one. Now alt+f2 and alt+f3 works in the VM console and it seems that the cores are being used for only around 0.3%. This new task has been going at it for 14 hours now and CPU time is only 33 minutes. Looks like it happens to every ATLAS WU I get.
I will try to look around the forum for a solution, if there is one..

EDIT: I already added app_config.xml for the task, currently I tell it to use 3 cores and ram limit is set to 8 gb. Any other suggestions for the xml?
app_config.xml:
<?xml version="1.0"?>
<app_config>
<project_max_concurrent>3</project_max_concurrent>
<app>
<name>ATLAS</name>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>ATLAS</app_name>
<avg_ncpus>3.000000</avg_ncpus>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<cmdline>--memory_size_mb 8000</cmdline>
</app_version>
</app_config>
ID: 33057 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 395
Credit: 78,858,844
RAC: 102,255
Message 33058 - Posted: 13 Nov 2017, 10:45:47 UTC - in response to Message 33057.  
Last modified: 13 Nov 2017, 10:46:10 UTC

I will try to look around the forum for a solution, if there is one..

Take a walk through my checklist: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=29359#29359


Supporting BOINC, a great concept !
ID: 33058 · Report as offensive     Reply Quote
Timo425

Send message
Joined: 28 Sep 17
Posts: 4
Credit: 451,660
RAC: 0
Message 33059 - Posted: 13 Nov 2017, 14:31:00 UTC
Last modified: 13 Nov 2017, 14:34:10 UTC

Yeti, thanks! Resetting lhc@home project seemed to do the trick, alt+f3 showed no much activity in first 10 mins but then 3 cores started crunching hard at 100%. :) Hopefully it will stay that way when I resume other projects as well, for now I will try to run a few ATLAS WUs in a row.
A lot of stuff going on in alt+F2 too.
ID: 33059 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : ATLAS application : Console monitoring


©2019 CERN