Message boards : ATLAS application : New console monitoring
Message board moderation

To post messages, you must log in.

AuthorMessage
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 40257 - Posted: 24 Oct 2019, 8:31:21 UTC

The rather shaky vbox console monitoring we had has been completely rewritten by computezrmle to look much better, both the event progress monitoring and the "top" resource usage monitoring.

Console 2 (ALT-F2) now looks like this:



and Console 3 (ALT-F3) is now interactive version of "top" without the scrolling and flickering that affected it before. For example you can hit the space bar to get an immediate update of information or type "u" then "atlas" to see only processes from the atlas user.

The changes will be active in the next couple of hours. Thanks again to computezrmle for this great work!
ID: 40257 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 40273 - Posted: 25 Oct 2019, 10:05:42 UTC - in response to Message 40257.  

Console 2 (ALT-F2) now looks like this:
Time left (rough estimate)                : 4d    5h    21m
ID: 40273 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,922,319
RAC: 138,016
Message 40275 - Posted: 25 Oct 2019, 10:43:32 UTC - in response to Message 40273.  

Console 2 (ALT-F2) now looks like this:
Time left (rough estimate)                : 4d    5h    21m

Anything wrong with it?

What are the corresponding values for:
"Total # of events to be processed"
"Total # of events already finished"

The algorithm currently used is experimental and does not deliver good estimates if
- only very few events are finished
- VMs are paused for longer periods


A new version that uses a better algorithm is under development and should be ready next week.
ID: 40275 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 40277 - Posted: 25 Oct 2019, 11:08:34 UTC - in response to Message 40275.  
Last modified: 25 Oct 2019, 11:09:40 UTC

Console 2 (ALT-F2) now looks like this:
Time left (rough estimate)                : 4d    5h    21m
Anything wrong with it?
Nothing wrong. The other 4 threads are busy too, so ATLAS don't get the most possible power.

- only very few events are finished
That was the case ;) Only 2 events were done, so 2 other events halfway done were not counted.
Now with 8 events done the estimate time left is down to 2 days and 2.5 hours.
Go on with your excellent work.
Not that it's a real issue, but the averages/worker seems a bit strange in the beginning.
After 1 event and 2 events it's the same value. Don't spend to much time on it.
ID: 40277 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,922,319
RAC: 138,016
Message 40282 - Posted: 25 Oct 2019, 12:04:18 UTC - in response to Message 40277.  

Not that it's a real issue, but the averages/worker seems a bit strange in the beginning.
After 1 event and 2 events it's the same value.

This is caused by the fact that the monitoring user is not allowed to directly read anything from the atlas directories.
Even if it could - ATLAS doesn't provide an interface where you can get the estimated runtimes for a currently running event.

Instead ATLAS reports finished events to it's logs and a helper program dumps that information to a file that is readable by the monitoring user.

The main reason to do it this way is to leave the ATLAS app completely untouched.

So even with the announced 2nd monitoring version there will be jumps regarding the time left estimation.
ID: 40282 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,922,319
RAC: 138,016
Message 40353 - Posted: 4 Nov 2019, 8:33:53 UTC

Since last night ATLAS (vbox) uses v2.2.0 of ATLAS Event Monitoring.
It includes the following changes:

- Modified display layout

- Change requests made by Crystal Pellet are implemented:
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=497&postid=6795#6795

- A modified "time left" algorithm that avoids using the precalculated averages from the logfiles.
Wrong values from the logfiles are still printed "as is" in the worker lines.

- Foreground service (display output) and background service (logfile dumping) are tied together to allow a restart of the complete monitoring when CTRL-c is pressed at console 2.
ID: 40353 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 40360 - Posted: 5 Nov 2019, 20:54:29 UTC - in response to Message 40353.  

Since last night ATLAS (vbox) uses v2.2.0 of ATLAS Event Monitoring.
Example:
ID: 40360 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 40362 - Posted: 6 Nov 2019, 10:56:43 UTC

computezrmle, I like your new monitoring, this is much much better than what we had in the past !

Good work, thank you very much

I should remember to update the checklist


Supporting BOINC, a great concept !
ID: 40362 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 40363 - Posted: 6 Nov 2019, 10:59:59 UTC - in response to Message 40360.  

Crystal,

is it okay if I use your hardcopy for the checklist ?

Yeti

Since last night ATLAS (vbox) uses v2.2.0 of ATLAS Event Monitoring.
Example:



Supporting BOINC, a great concept !
ID: 40363 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 40365 - Posted: 6 Nov 2019, 12:34:32 UTC - in response to Message 40363.  

Crystal,

is it okay if I use your hardcopy for the checklist ?

Yeti
Of course. It's on imgbb.com without account, so I'm not sure whether it will stay for ever.
ID: 40365 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 40552 - Posted: 20 Nov 2019, 13:34:24 UTC

Version 3.2.0 of Console ALT-F2 (Event Monitoring) is active. Thank you, computezrmle !
Example how the new monitoring looks like when running with 4 CPUs:

ID: 40552 · Report as offensive     Reply Quote

Message boards : ATLAS application : New console monitoring


©2024 CERN