Message boards : ATLAS application : ATLAS vbox and native 3.01
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,576,736
RAC: 6,763
Message 48659 - Posted: 25 Sep 2023, 14:37:16 UTC - in response to Message 48644.  

Great, I love it ! Thank you for this helpful command

As a Linux-Newbee what would be neccessary to show the BOINC Slot-Number in the line ? (I run three WUs with each 4-Cores simultaneous)

Thanks in Advance
Yeti

I modified the command line to monitor ATLAS native 3.01:
In this example, I used 2 CPUs per task hence the tail -n2.
sudo watch -n10 "find /var/lib/boinc-client/slots/ \( -name \"log.EVNTtoHITS\" -o -name \"AthenaMP.log\" \) |sort |xargs -I {} -n1 sh -c \"egrep 'INFO.*Run:Event ' {} |tail -n2\"|sort -k 7,7"

An example of output:
17:32:48 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool       390     0    INFO          Run:Event 450000:20848791       (200th event for this worker) took 82.67 s. New average 93.67 +- 3.91
17:32:44 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool       391     1    INFO          Run:Event 450000:20848792       (192th event for this worker) took 45.15 s. New average 98.49 +- 3.622
17:32:03 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool       362     0    INFO          Run:Event 450000:22570763       (186th event for this worker) took 40.7 s. New average 96.78 +- 3.699
17:32:53 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool       363     1    INFO          Run:Event 450000:22570764       (178th event for this worker) took 128.6 s. New average 102.1 +- 4.028
17:33:07 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool       312     1    INFO          Run:Event 450000:22644313       (159th event for this worker) took 209.2 s. New average 95.61 +- 3.997
17:33:01 ISF_Kernel_FullG4MT_QS.ISF_LongLivedGeant4Tool       313     0    INFO          Run:Event 450000:22644314       (155th event for this worker) took 152.8 s. New average 99 +- 4.297



Supporting BOINC, a great concept !
ID: 48659 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2421
Credit: 227,193,917
RAC: 132,380
Message 48660 - Posted: 25 Sep 2023, 17:33:48 UTC - in response to Message 48659.  

Try this:
[sudo] watch -n10 "find /var/lib/boinc-client/slots -name \"log.EVNTtoHITS\" |sort |xargs -I {} -n1 sh -c \"echo "{}"; grep -Po 'INFO.*Run:Event.*\K\(.*' {} |tail -n4; echo\""


Example (2-core setup):
/var/lib/boinc-client/slots/0/PanDA_Pilot-5970886672/log.EVNTtoHITS
(22th event for this worker) took 440 s. New average 176.4 +- 22.97
(22th event for this worker) took 56.65 s. New average 169.1 +- 19.16

/var/lib/boinc-client/slots/1/PanDA_Pilot-5970704360/log.EVNTtoHITS
(66th event for this worker) took 60.29 s. New average 181.3 +- 11.98
(70th event for this worker) took 205.2 s. New average 174.8 +- 10.96

/var/lib/boinc-client/slots/2/PanDA_Pilot-5970572499/log.EVNTtoHITS
(119th event for this worker) took 51.69 s. New average 170.3 +- 7.973
(117th event for this worker) took 60.95 s. New average 171.7 +- 7.405


Hints:
- Removed the search for "AthenaMP.log" since ATLAS now reports everything to "log.EVNTtoHITS".
- use "tail -n4" for a 4-core setup, "tail -n3" for a 3-core setup ...
- Like all suggested commands before the oneliner prints (partly) the last n lines matching the pattern rather than the last line per worker thread.
Should be good enough for a rough overview.
ID: 48660 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,576,736
RAC: 6,763
Message 48662 - Posted: 25 Sep 2023, 19:38:24 UTC - in response to Message 48660.  

Try this:
[sudo] watch -n10 "find /var/lib/boinc-client/slots -name \"log.EVNTtoHITS\" |sort |xargs -I {} -n1 sh -c \"echo "{}"; grep -Po 'INFO.*Run:Event.*\K\(.*' {} |tail -n4; echo\""
...

This rocks !

Thank you very much

Yeti


Supporting BOINC, a great concept !
ID: 48662 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 42
Credit: 5,467,485
RAC: 22,736
Message 48663 - Posted: 26 Sep 2023, 0:50:53 UTC - in response to Message 48662.  

Try this:
[sudo] watch -n10 "find /var/lib/boinc-client/slots -name \"log.EVNTtoHITS\" |sort |xargs -I {} -n1 sh -c \"echo "{}"; grep -Po 'INFO.*Run:Event.*\K\(.*' {} |tail -n4; echo\""
...

This rocks !

Thank you very much

Yeti


+1
ID: 48663 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2114
Credit: 159,914,613
RAC: 83,929
Message 48705 - Posted: 30 Sep 2023, 7:34:10 UTC
Last modified: 30 Sep 2023, 7:43:06 UTC

ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas)
Name - 186NDmwdg63np2BDcpmwOghnABFKDmABFKDmtdFKDmDR9KDmriteGo
Is it possible to get some of this 1.45 GByte in the Squid - ProxyServer?
Or is it possible to reduce this file in other whise?
Have reduced from 8 Tasks to 2 Tasks for each Threadripper in prefs!
ID: 48705 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 810
Credit: 654,068,742
RAC: 283,306
Message 48711 - Posted: 30 Sep 2023, 11:39:43 UTC

Maximium object size in cache is 6GB so should be there if needed.
ID: 48711 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2421
Credit: 227,193,917
RAC: 132,380
Message 48712 - Posted: 30 Sep 2023, 12:16:01 UTC - in response to Message 48711.  

If squid.conf from the forum is used ATLAS EVNT files are excluded from being cached by intention.
Squid will just download and forward them to the BOINC client.

It is configured that way because:
- each task sends a unique URL for the file, hence from the HTTP point of view they are all different
- their content is different *)
- not writing them to disk avoids the cache quota being used up very quickly
- not writing them to disk avoids the files being written to disk at all (on the Squid box)


A large "maximium object size" is mainly thought to have enough headroom for vdi files.
Unlike the EVNT files those will be written to the disk cache.


*) In fact David Cameron once mentioned they have a limited #different EVNT files.
But the chance to get tasks using the same input file is extremely small, hence not worth to cache them.
ID: 48712 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 676
Credit: 43,739,534
RAC: 15,757
Message 48781 - Posted: 14 Oct 2023, 21:40:46 UTC

I had an unusual Atlas task that shows abnormal CPU time. Otherwise I don't see anything different for it. Here's the result: https://lhcathome.cern.ch/lhcathome/result.php?resultid=400390410 and here's the same in Panda: https://bigpanda.cern.ch/job/5984624678/

The task was run on a win 10 host inside a VM with 4 CPU cores. Normally these 400 event tasks run for about 3-4 hours of wall clock time and 12-16 hours of CPU time. The task in question ran for 3:37 hours but measured CPU time of 36 hours. That would correspond to 10 CPU cores used. But CPU usage was normal while it was running. So I wonder what is the story behind this bizarre CPU time?
ID: 48781 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1284
Credit: 8,513,326
RAC: 2,911
Message 48782 - Posted: 15 Oct 2023, 10:00:30 UTC - in response to Message 48781.  

So I wonder what is the story behind this bizarre CPU time?
Really strange. It seems to me, that it's a BOINC issue.
The difference from 1 day off was already during the run:

2023-10-14 19:18:15 (10428): Status Report: Elapsed Time: '6000.000000'
2023-10-14 19:18:15 (10428): Status Report: CPU Time: '107253.250000'
2023-10-14 20:58:18 (10428): Status Report: Elapsed Time: '12000.000000'
2023-10-14 20:58:18 (10428): Status Report: CPU Time: '129955.890625'
ID: 48782 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2421
Credit: 227,193,917
RAC: 132,380
Message 48783 - Posted: 15 Oct 2023, 13:08:41 UTC - in response to Message 48782.  

These lines are not from BOINC.
Instead they are from ATLAS.

Looks like that task had an internal problem which is not exposed to any log here.
ID: 48783 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2114
Credit: 159,914,613
RAC: 83,929
Message 49643 - Posted: 25 Feb 2024, 10:28:18 UTC

https://lhcathome.cern.ch/lhcathome/result.php?resultid=406919512
[2024-02-25 11:16:18] 2024-02-25 10:16:04,707 | WARNING | format EVNTtoHITS has no such key: dbData
[2024-02-25 11:16:18] 2024-02-25 10:16:04,707 | WARNING | format EVNTtoHITS has no such key: dbTime
[2024-02-25 11:16:18] 2024-02-25 10:16:04,707 | WARNING | wrong length of table data, x=[1708855815.0, 1708855876.0], y=[1909.0, 253620.0] (must be same and length>=4)
[2024-02-25 11:16:18] 2024-02-25 10:16:04,708 | INFO | ..............................
ID: 49643 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : ATLAS application : ATLAS vbox and native 3.01


©2024 CERN