Message boards : ATLAS application : ATLAS native v2.91
Message board moderation

To post messages, you must log in.

AuthorMessage
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 47223 - Posted: 5 Sep 2022, 9:12:42 UTC

ATLAS native 2.91 was just released, which contains the improvements in v2.90, but with the problem with read-only tmp dirs fixed.

Please let us know if you see any problems!
ID: 47223 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,990,641
RAC: 136,455
Message 47224 - Posted: 5 Sep 2022, 9:42:35 UTC - in response to Message 47223.  

First tasks are running with a local apptainer (from the Linux vendor).
So far there are no unexpected issues.
ID: 47224 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,990,641
RAC: 136,455
Message 47230 - Posted: 6 Sep 2022, 7:49:14 UTC

Since this ATLAS version is out there's a slow but steady increase of the average load values.
The more cores are given to ATLAS the steeper the load increases.
It looks like the cleanup doesn't work at the end of a task and some processes keep running.
Needs investigation.

The bad thing is that this will sooner or later lead to a crash.


After a reboot fresh ATLAS tasks create the following subfolders in the global tmp folders
in /tmp
hsperfdata_<boinc_username>

in /var/tmp
.cricinfo_<boinc_userid>
ID: 47230 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 47231 - Posted: 6 Sep 2022, 9:44:11 UTC - in response to Message 47230.  

Can you see what processes are left running once a task finishes?

Unfortunately there are still some parts of ATLAS tasks hard-coded to use /var/tmp, we are working on fixing this. But I think /tmp/hsperfdata is something related to Java(?) and not coming from ATLAS tasks.
ID: 47231 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,990,641
RAC: 136,455
Message 47232 - Posted: 6 Sep 2022, 10:17:59 UTC - in response to Message 47231.  

Just started an ATLAS task on a BOINC test instance.


My proxy log shows this, requested by that ATLAS task:
[06/Sep/2022:12:02:08 +0200] "CONNECT atlas-cric.cern.ch:443 HTTP/1.0" 200 1797516 "-" "-" TCP_TUNNEL:HIER_DIRECT


But I think /tmp/hsperfdata is something related to Java(?) and not coming from ATLAS tasks.

Right, it's from Java, but created during the start of ATLAS.


Will let the task finish to see what remains.
ID: 47232 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,173,759
RAC: 105,244
Message 47233 - Posted: 6 Sep 2022, 11:16:10 UTC

Have one test in CentOS9-VM:https://lhcathome.cern.ch/lhcathome/results.php?hostid=10813499
and seeing NO entry in \tmp or \var\tmp.
ID: 47233 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,990,641
RAC: 136,455
Message 47234 - Posted: 6 Sep 2022, 13:30:28 UTC - in response to Message 47232.  

Just started an ATLAS task on a BOINC test instance.
.
.
.
Will let the task finish to see what remains.

The test task finished without leaving anything weird.
Let's see what happens over night when the machines are back on full load.
ID: 47234 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 47333 - Posted: 30 Sep 2022, 15:40:25 UTC

Since today around 12:00 UTC I see a lot (not all) of Atlas-Native-Tasks fail after 600 seconds runtime.

You can check here: https://lhcathome.cern.ch/lhcathome/results.php?userid=555&offset=0&show_names=0&state=6&appid=


Supporting BOINC, a great concept !
ID: 47333 · Report as offensive     Reply Quote

Message boards : ATLAS application : ATLAS native v2.91


©2024 CERN