Message boards : ATLAS application : Missing Output at Console 2
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,401,087
RAC: 123,599
Message 31867 - Posted: 7 Aug 2017, 10:46:40 UTC - in response to Message 31833.  

8 on my main computer are fine they have the scrolling list of events.

Mine are still without console output (except the first line).
ID: 31867 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,464,258
RAC: 5,837
Message 31868 - Posted: 7 Aug 2017, 10:58:10 UTC

So far as I have understood it depends on the kind of Input-Files that our LHC-WUs are generated from. Maybe that David had not finished his fix for this.

So it must not be critical if the output is missing; if you want to check if your WU is healthy use "Properties" and check "elapsed Time" versus "CPU-Time"


Supporting BOINC, a great concept !
ID: 31868 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,401,087
RAC: 123,599
Message 31869 - Posted: 7 Aug 2017, 11:31:30 UTC - in response to Message 31868.  

So far as I have understood it depends on the kind of Input-Files that our LHC-WUs are generated from. Maybe that David had not finished his fix for this.

So it must not be critical if the output is missing; if you want to check if your WU is healthy use "Properties" and check "elapsed Time" versus "CPU-Time"

My results seem to be valid.
So the main objective of the WUs is fulfilled.

The missing console output is nothing to worry about.
My comment should have been just a hint that some hosts still have problems with it while others don't, e.g. Toby Broom's.
ID: 31869 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1687
Credit: 102,987,306
RAC: 125,849
Message 31908 - Posted: 10 Aug 2017, 18:10:22 UTC

A minute ago, I retried the VM console - and, surprise, this time it worked :-)

Thanks to whoever got the problem solved!
ID: 31908 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,401,087
RAC: 123,599
Message 32042 - Posted: 22 Aug 2017, 9:38:45 UTC - in response to Message 31908.  

A minute ago, I retried the VM console - and, surprise, this time it worked :-)

Thanks to whoever got the problem solved!

Is it really solved?
On my hosts the output is printed only right before the end of WU.
ID: 32042 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,401,087
RAC: 123,599
Message 32157 - Posted: 31 Aug 2017, 7:48:31 UTC - in response to Message 32042.  

A minute ago, I retried the VM console - and, surprise, this time it worked :-)

Thanks to whoever got the problem solved!

Is it really solved?
On my hosts the output is printed only right before the end of WU.

Still not solved.
And what about the plans to activate a top console?
This would be very helpful to identify error conditions, e.g. not enough RAM.
ID: 32157 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 32171 - Posted: 1 Sep 2017, 12:59:59 UTC - in response to Message 32157.  

I think there might be a problem with reporting the events in single-core tasks and from a brief look at your results I see only single core. In this case the logs we extract the information from are structured slightly differently. I'll try running some single core tasks to investigate.

In the meantime I think I have finally got top working in console 3, can others confirm that it works for them? If so I will post a new thread to celebrate!
ID: 32171 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,401,087
RAC: 123,599
Message 32173 - Posted: 1 Sep 2017, 14:03:44 UTC - in response to Message 32171.  

David Cameron wrote:
... finally got top working in console 3, can others confirm that it works ...

At least not for this WU from today:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=154071874
ID: 32173 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,528,244
RAC: 15,546
Message 32175 - Posted: 1 Sep 2017, 14:53:02 UTC - in response to Message 32173.  

David Cameron wrote:
... finally got top working in console 3, can others confirm that it works ...

At least not for this WU from today:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=154071874

I see this one with working top. https://lhcathome.cern.ch/lhcathome/result.php?resultid=154068006. But it is not acting like top on LHCb tasks (=updating screen on same position about every second) but scrolling a new screenful of text every second or so. Anyway it gives you the information about memory and CPU usage as it should.
ID: 32175 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,401,087
RAC: 123,599
Message 32241 - Posted: 5 Sep 2017, 7:56:44 UTC - in response to Message 32175.  

Harri Liljeroos wrote:
I see this one with working top. https://lhcathome.cern.ch/lhcathome/result.php?resultid=154068006. But it is not acting like top on LHCb tasks (=updating screen on same position about every second) but scrolling a new screenful of text every second or so. Anyway it gives you the information about memory and CPU usage as it should.

I can also confirm that the top output on console 3 works like Harri described it.
ID: 32241 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 32268 - Posted: 5 Sep 2017, 14:53:21 UTC - in response to Message 32175.  
Last modified: 5 Sep 2017, 14:57:28 UTC

But it is not acting like top on LHCb tasks (=updating screen on same position about every second) but scrolling a new screenful of text every second or so. Anyway it gives you the information about memory and CPU usage as it should.


Can you share a screenshot of your console? I could not manage to get a persistent top process running in a tty so I simply run it once every 5 seconds and send the first 24 lines of output to the tty. On my rdesktop (linux) it looks ok because the console is 24 lines high so I'd like to see if it looks bad on Windows.

EDIT: this is how it looks for me:



It shows nicely the single-core VM problem with 8 processes using 12.5% CPU each :)
ID: 32268 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,464,258
RAC: 5,837
Message 32269 - Posted: 5 Sep 2017, 14:54:53 UTC - in response to Message 32268.  

On my rdesktop (linux) it looks ok because the console is 24 lines high so I'd like to see if it looks bad on Windows.

On my windows machines it looked okay so far


Supporting BOINC, a great concept !
ID: 32269 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,528,244
RAC: 15,546
Message 32282 - Posted: 5 Sep 2017, 18:08:38 UTC - in response to Message 32268.  

But it is not acting like top on LHCb tasks (=updating screen on same position about every second) but scrolling a new screenful of text every second or so. Anyway it gives you the information about memory and CPU usage as it should.


Can you share a screenshot of your console? I could not manage to get a persistent top process running in a tty so I simply run it once every 5 seconds and send the first 24 lines of output to the tty. On my rdesktop (linux) it looks ok because the console is 24 lines high so I'd like to see if it looks bad on Windows.


Currently my system is struggling with over 50 LHCb tasks with deadline on the 9th. To push these through as fast as I can I have unselected all subprojects until I have cleared my cache of LHCb tasks. So until then I won't be running any Atlas tasks.
ID: 32282 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,401,087
RAC: 123,599
Message 32287 - Posted: 5 Sep 2017, 18:45:56 UTC - in response to Message 32282.  

Harri Liljeroos wrote:
Currently my system is struggling with over 50 LHCb tasks with deadline on the 9th.

Let me guess.
In the past days you had a couple of LHCb WUs that finished very quickly due to the server problems.
Those short runtimes raised the flops value of your host's DB record and echoed back via the latest scheduler replies.
Now your hosts calculate a very short runtime estimate and download far too much WUs to get the cache filled.
As a side effect the credits for the next WUs will be near absolute zero.

If this is the reason it has to be addressed to the developers. The outliers should be handled like at sixtrack.
ID: 32287 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,528,244
RAC: 15,546
Message 32291 - Posted: 5 Sep 2017, 20:17:42 UTC - in response to Message 32287.  

Harri Liljeroos wrote:
Currently my system is struggling with over 50 LHCb tasks with deadline on the 9th.

Let me guess.
In the past days you had a couple of LHCb WUs that finished very quickly due to the server problems.
Those short runtimes raised the flops value of your host's DB record and echoed back via the latest scheduler replies.
Now your hosts calculate a very short runtime estimate and download far too much WUs to get the cache filled.
As a side effect the credits for the next WUs will be near absolute zero.

If this is the reason it has to be addressed to the developers. The outliers should be handled like at sixtrack.

That's just about sums what happened. Originally Boinc downloaded a bunch of tasks that lasted about 20 minutes each and they all validated. This happened when I had switched from Atlas to LHCb because Atlas had run out tasks and jobs. The average processing rate went to above 400 GFLOPS (now it shows about 140). With next Atlas problems I switched again to LHCb and Boinc downloaded over 90 LHCb tasks thinking that they are also now taking only 20 minutes each. Instead they are taking anything between 12 minutes to 12 hours. The credit given is matching the run times (from 15 to about 500).
ID: 32291 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,777,751
RAC: 128,475
Message 32317 - Posted: 7 Sep 2017, 4:24:18 UTC

Atlas-Tasks with two CPU's show only with F2:

Event processing information will appear here.

Have app_config memorysize changed from 5 GByte to 7 GByte,
because the F3-Top show a use of memorysize between 5.3 and 5.7 and no swap.
ID: 32317 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1687
Credit: 102,987,306
RAC: 125,849
Message 32318 - Posted: 7 Sep 2017, 5:00:04 UTC - in response to Message 32317.  
Last modified: 7 Sep 2017, 5:59:12 UTC

Atlas-Tasks with two CPU's show only with F2:
Event processing information will appear here..

same thing happens with my tasks. No idea why this still / again does not work properly.

Have app_config memorysize changed from 5 GByte to 7 GByte,
because the F3-Top show a use of memorysize between 5.3 and 5.7 and no swap.

Recently, I noticed an increased memory usage for the four 2-core ATLAS tasks which are running concurrently on one of my systems. In my app_config, 7.3 GB are set, and they seem to be used up to a great part.
ID: 32318 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,401,087
RAC: 123,599
Message 32319 - Posted: 7 Sep 2017, 5:10:53 UTC - in response to Message 32317.  

maeax wrote:
Atlas-Tasks with two CPU's show only with F2:

Event processing information will appear here.

Same here on a 2-core setup.


maeax wrote:
Have app_config memorysize changed from 5 GByte to 7 GByte,
because the F3-Top show a use of memorysize between 5.3 and 5.7 and no swap.

You may compare the TOP values for the OS cache.
If WUs fail before all athena.pys (corresponding to configured cores) are launched, add more RAM.
If the WU starts without an error a higher RAM value for the VM will only add more OS cache.
ID: 32319 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,777,751
RAC: 128,475
Message 32320 - Posted: 7 Sep 2017, 5:35:49 UTC - in response to Message 32319.  
Last modified: 7 Sep 2017, 5:36:22 UTC

If the WU starts without an error a higher RAM value for the VM will only add more OS cache.

Maybe a reduce to 6 GByte is possible.
The efficient is better for the whole system in this way.
It is not important to find the minimum of Mem-usage, but the optimum!
ID: 32320 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 32350 - Posted: 8 Sep 2017, 14:31:12 UTC - in response to Message 32319.  

maeax wrote:
Atlas-Tasks with two CPU's show only with F2:

Event processing information will appear here.

Same here on a 2-core setup.



Sorry, I think I broke the event information when I added the top output. It should be fixed now, it may take a few hours to propagate to the WU.
ID: 32350 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : ATLAS application : Missing Output at Console 2


©2024 CERN