Message boards :
ATLAS application :
ATLAS vbox and native 3.01
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,268,029 RAC: 6,930 |
Great, I love it ! Thank you for this helpful command As a Linux-Newbee what would be neccessary to show the BOINC Slot-Number in the line ? (I run three WUs with each 4-Cores simultaneous) Thanks in Advance Yeti I modified the command line to monitor ATLAS native 3.01: Supporting BOINC, a great concept ! |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
Try this: [sudo] watch -n10 "find /var/lib/boinc-client/slots -name \"log.EVNTtoHITS\" |sort |xargs -I {} -n1 sh -c \"echo "{}"; grep -Po 'INFO.*Run:Event.*\K\(.*' {} |tail -n4; echo\"" Example (2-core setup): /var/lib/boinc-client/slots/0/PanDA_Pilot-5970886672/log.EVNTtoHITS (22th event for this worker) took 440 s. New average 176.4 +- 22.97 (22th event for this worker) took 56.65 s. New average 169.1 +- 19.16 /var/lib/boinc-client/slots/1/PanDA_Pilot-5970704360/log.EVNTtoHITS (66th event for this worker) took 60.29 s. New average 181.3 +- 11.98 (70th event for this worker) took 205.2 s. New average 174.8 +- 10.96 /var/lib/boinc-client/slots/2/PanDA_Pilot-5970572499/log.EVNTtoHITS (119th event for this worker) took 51.69 s. New average 170.3 +- 7.973 (117th event for this worker) took 60.95 s. New average 171.7 +- 7.405 Hints: - Removed the search for "AthenaMP.log" since ATLAS now reports everything to "log.EVNTtoHITS". - use "tail -n4" for a 4-core setup, "tail -n3" for a 3-core setup ... - Like all suggested commands before the oneliner prints (partly) the last n lines matching the pattern rather than the last line per worker thread. Should be good enough for a rough overview. |
Send message Joined: 2 Sep 04 Posts: 455 Credit: 201,268,029 RAC: 6,930 |
|
Send message Joined: 24 Jun 10 Posts: 43 Credit: 6,160,703 RAC: 1,317 |
Try this: +1 |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 456 |
ATLAS Simulation 3.01 (vbox64_mt_mcore_atlas) Name - 186NDmwdg63np2BDcpmwOghnABFKDmABFKDmtdFKDmDR9KDmriteGo Is it possible to get some of this 1.45 GByte in the Squid - ProxyServer? Or is it possible to reduce this file in other whise? Have reduced from 8 Tasks to 2 Tasks for each Threadripper in prefs! |
Send message Joined: 27 Sep 08 Posts: 850 Credit: 692,823,409 RAC: 77,584 |
Maximium object size in cache is 6GB so should be there if needed. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
If squid.conf from the forum is used ATLAS EVNT files are excluded from being cached by intention. Squid will just download and forward them to the BOINC client. It is configured that way because: - each task sends a unique URL for the file, hence from the HTTP point of view they are all different - their content is different *) - not writing them to disk avoids the cache quota being used up very quickly - not writing them to disk avoids the files being written to disk at all (on the Squid box) A large "maximium object size" is mainly thought to have enough headroom for vdi files. Unlike the EVNT files those will be written to the disk cache. *) In fact David Cameron once mentioned they have a limited #different EVNT files. But the chance to get tasks using the same input file is extremely small, hence not worth to cache them. |
Send message Joined: 28 Sep 04 Posts: 732 Credit: 49,363,408 RAC: 17,955 |
I had an unusual Atlas task that shows abnormal CPU time. Otherwise I don't see anything different for it. Here's the result: https://lhcathome.cern.ch/lhcathome/result.php?resultid=400390410 and here's the same in Panda: https://bigpanda.cern.ch/job/5984624678/ The task was run on a win 10 host inside a VM with 4 CPU cores. Normally these 400 event tasks run for about 3-4 hours of wall clock time and 12-16 hours of CPU time. The task in question ran for 3:37 hours but measured CPU time of 36 hours. That would correspond to 10 CPU cores used. But CPU usage was normal while it was running. So I wonder what is the story behind this bizarre CPU time? |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,266 |
So I wonder what is the story behind this bizarre CPU time?Really strange. It seems to me, that it's a BOINC issue. The difference from 1 day off was already during the run: 2023-10-14 19:18:15 (10428): Status Report: Elapsed Time: '6000.000000' 2023-10-14 19:18:15 (10428): Status Report: CPU Time: '107253.250000' 2023-10-14 20:58:18 (10428): Status Report: Elapsed Time: '12000.000000' 2023-10-14 20:58:18 (10428): Status Report: CPU Time: '129955.890625' |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
These lines are not from BOINC. Instead they are from ATLAS. Looks like that task had an internal problem which is not exposed to any log here. |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 456 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=406919512 [2024-02-25 11:16:18] 2024-02-25 10:16:04,707 | WARNING | format EVNTtoHITS has no such key: dbData [2024-02-25 11:16:18] 2024-02-25 10:16:04,707 | WARNING | format EVNTtoHITS has no such key: dbTime [2024-02-25 11:16:18] 2024-02-25 10:16:04,707 | WARNING | wrong length of table data, x=[1708855815.0, 1708855876.0], y=[1909.0, 253620.0] (must be same and length>=4) [2024-02-25 11:16:18] 2024-02-25 10:16:04,708 | INFO | .............................. |
©2024 CERN