Message boards : ATLAS application : atlas error
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Saturn911

Send message
Joined: 3 Nov 12
Posts: 54
Credit: 136,961,194
RAC: 102,079
Message 50166 - Posted: 10 May 2024, 18:55:33 UTC - in response to Message 50152.  
Last modified: 10 May 2024, 19:08:17 UTC

I got same invalid results for a bunch of my tasks too. (Example)

I think the real problem is this line:

> CVMFS is not available at /cvmfs/atlas-nightlies.cern.ch/repo/sw/logs/lastUpdate

atlas-nightlies.cern.ch is not in the configured sites, so it's not mounted. Do we need an update for our cvmfs config, or perhaps this is a batch of jobs not meant to send out to volunteers?

If you use the suggested CVMFS configuration supplementary repositories are mounted automatically when an app requires them.
The errors currently shown point out a misconfigured ATLAS batch rather than a local issue.

Permanently adding those supplementary repositories to the base configuration just wastes resources on the client, the server and along the whole network route.


These are coming more and more now.
Dozens of them. It^s incredible ...
ID: 50166 · Report as offensive     Reply Quote
Saturn911

Send message
Joined: 3 Nov 12
Posts: 54
Credit: 136,961,194
RAC: 102,079
Message 50177 - Posted: 13 May 2024, 14:17:08 UTC - in response to Message 50166.  
Last modified: 13 May 2024, 14:18:10 UTC

If you use the suggested CVMFS configuration supplementary repositories are mounted automatically when an app requires them.


looks like this is true for automounter "autofs" but it's not for automounter "systemd".
switched to systemd since autofs is no longer available as a binary on manjaro linux.

any hints for configuring systemd?
ID: 50177 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 46
Credit: 61,476,052
RAC: 108,254
Message 50181 - Posted: 14 May 2024, 5:10:35 UTC - in response to Message 50177.  
Last modified: 14 May 2024, 5:58:43 UTC

I just realized I wasted thousands of tasks and 1.5TB of the project bandwidth in past 20 hours... Oops and very sorry for that and I have paused all work fetch now. My setup was a bit weird, carried over from Arch when I ran the cvmfs container with configs copied over from an Ubuntu VM after installing the deb package there. Guess I can install the proper official packages now that I've switched back to Ubuntu. Hopefully that would make sure I always have the recommended configuration from now on.

Is this the latest recommended configuration? (Edit: Guess yes. I have seen the task successfully find the nightlies repo afterwards.)

Related note: I wonder if it's possible to have the task fail, instead of showing validation error after uploading the result for basic setup issues, like the missing repo error here? That way, BOINC client would automatically back off, instead of keeping fetching and uploading invalid results. I have monitoring for failed jobs on client side too. However, a successfully uploaded result marked as invalid requires me to check the website periodically. I noticed this today only from my bandwidth monitoring...
ID: 50181 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1757
Credit: 115,865,700
RAC: 84,714
Message 50219 - Posted: 22 May 2024, 18:38:45 UTC

since yesterday, I have had a few tasks on different PCs which successfully produced a HITS file and were uploaded okay, but ended up with "confirmation error" ("Bestätigungsfehler") - example see here:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=411294133

any idea what's behind this ?
ID: 50219 · Report as offensive     Reply Quote
Saturn911

Send message
Joined: 3 Nov 12
Posts: 54
Credit: 136,961,194
RAC: 102,079
Message 50220 - Posted: 22 May 2024, 18:51:26 UTC - in response to Message 50219.  

since yesterday, I have had a few tasks on different PCs which successfully produced a HITS file and were uploaded okay, but ended up with "confirmation error" ("Bestätigungsfehler") - example see here:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=411294133

any idea what's behind this ?

Same here "Atlas native"
https://lhcathome.cern.ch/lhcathome/result.php?resultid=411222337
No errors but invalid :-(
ID: 50220 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2198
Credit: 173,403,419
RAC: 44,056
Message 50221 - Posted: 23 May 2024, 5:20:15 UTC - in response to Message 50220.  

Seeing the same,
hoping Cern-IT give us the creditpoints, because
the Atlas-Tasks are confirmed and validated.
This are days of running time for me.
ID: 50221 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1757
Credit: 115,865,700
RAC: 84,714
Message 50222 - Posted: 23 May 2024, 5:43:56 UTC
Last modified: 23 May 2024, 5:59:20 UTC

last night, I had at least two tasks which failed after about 5 minutes:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=411314062

Edit: I now have found several other such tasks on various other computers within my network.
ID: 50222 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 247
Credit: 5,974,599
RAC: 0
Message 50223 - Posted: 23 May 2024, 5:59:07 UTC

Sorry for these validation errors. We had a problem with the ATLAS validator yesterday on our new server. We have now set the relevant results to re-validate in the database. Hopefully you should get credit soon, unless something else prevents it.
ID: 50223 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1757
Credit: 115,865,700
RAC: 84,714
Message 50224 - Posted: 23 May 2024, 6:12:12 UTC - in response to Message 50223.  

Sorry for these validation errors. We had a problem with the ATLAS validator yesterday on our new server. We have now set the relevant results to re-validate in the database. Hopefully you should get credit soon, unless something else prevents it.
I just noticed that the status of the tasks in question has changed from "invalid" to "unknown" - and still no credits ...
why "unknown" ?
ID: 50224 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2198
Credit: 173,403,419
RAC: 44,056
Message 50225 - Posted: 23 May 2024, 9:54:06 UTC - in response to Message 50223.  
Last modified: 23 May 2024, 10:15:38 UTC

Ok, Nils,
the Atlas-Tasks are away from Boinc for me.
They are now shown as running.
But the Timestamp is from yesterday.
https://lhcathome.cern.ch/lhcathome/results.php?userid=75468
Edit: now download failed.
ID: 50225 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2198
Credit: 173,403,419
RAC: 44,056
Message 50226 - Posted: 23 May 2024, 11:01:12 UTC - in response to Message 50225.  

Now, waiting for Confirmation, finished successful.
ID: 50226 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2198
Credit: 173,403,419
RAC: 44,056
Message 50227 - Posted: 23 May 2024, 12:47:29 UTC - in response to Message 50226.  

Creditpoints completed, Thank you.
ID: 50227 · Report as offensive     Reply Quote
Saturn911

Send message
Joined: 3 Nov 12
Posts: 54
Credit: 136,961,194
RAC: 102,079
Message 50234 - Posted: 24 May 2024, 11:10:53 UTC - in response to Message 50227.  

ID: 50234 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2509
Credit: 249,213,972
RAC: 127,521
Message 50236 - Posted: 24 May 2024, 11:30:05 UTC - in response to Message 50234.  

Looks like something got messed on the new server.

This result has been received before it has been sent:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=411215779
Sent 23 May 2024, 9:09:38 UTC
Received 22 May 2024, 7:13:56 UTC
ID: 50236 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2198
Credit: 173,403,419
RAC: 44,056
Message 50237 - Posted: 24 May 2024, 11:58:49 UTC - in response to Message 50234.  
Last modified: 24 May 2024, 11:59:42 UTC

Got two newer WU's with invalid result.
They looks o.k. for me. So what's wrong?

Now 1.950 instead of 400 Events.
ID: 50237 · Report as offensive     Reply Quote
M0CZY

Send message
Joined: 27 Apr 24
Posts: 10
Credit: 535,442
RAC: 2,009
Message 50249 - Posted: 25 May 2024, 10:32:57 UTC

Since I updated to Ubuntu 24.04 LTS, my apptainer for Atlas isn't working. I have purged, and reinstalled my apptainer application, to no effect. I have no idea what is wrong, or how to fix it.
[2024-05-25 09:25:59] Using apptainer image /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7
[2024-05-25 09:25:59] Checking for apptainer binary...
[2024-05-25 09:25:59] Using apptainer found in PATH at /usr/bin/apptainer
[2024-05-25 09:25:59] Running /usr/bin/apptainer --version
[2024-05-25 09:25:59] apptainer version 1.3.1
[2024-05-25 09:25:59] Checking apptainer works with /usr/bin/apptainer exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 hostname
[2024-05-25 09:25:59] apptainer isnt working: ERROR  : Could not write info to setgroups: Permission denied
[2024-05-25 09:25:59] ERROR  : Error while waiting event for user namespace mappings: no event received
[2024-05-25 09:25:59] 
09:36:00 (11112): run_atlas exited; CPU time 0.127016
09:36:00 (11112): app exit status: 0x1
09:36:00 (11112): called boinc_finish(195)

ID: 50249 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2198
Credit: 173,403,419
RAC: 44,056
Message 50250 - Posted: 25 May 2024, 10:45:56 UTC - in response to Message 50249.  
Last modified: 25 May 2024, 10:48:21 UTC

[2024-05-25 09:25:59] apptainer isnt working: ERROR : Could not write info to setgroups: Permission denied
You can search in folder number crunching for this setting.
ID: 50250 · Report as offensive     Reply Quote
M0CZY

Send message
Joined: 27 Apr 24
Posts: 10
Credit: 535,442
RAC: 2,009
Message 50251 - Posted: 25 May 2024, 12:42:01 UTC - in response to Message 50250.  
Last modified: 25 May 2024, 12:43:53 UTC

You can search in folder number crunching for this setting.

I don't understand what your answer means.
I can see Permission denied, but I don't know how or where to fix this.
At the moment, I am limited to only using Virtualbox for Atlas or Theory work units.
ID: 50251 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2198
Credit: 173,403,419
RAC: 44,056
Message 50252 - Posted: 25 May 2024, 12:46:26 UTC - in response to Message 50251.  

ID: 50252 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 46
Credit: 61,476,052
RAC: 108,254
Message 50253 - Posted: 25 May 2024, 19:38:22 UTC - in response to Message 50252.  
Last modified: 25 May 2024, 20:17:55 UTC

Well you didn't search yourself. :-) @M0CZY's message above is the only one on this forum that showed the exact error messages. I ran into the same issue after upgrading to Ubuntu 24.04.

Example failed task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=411375194

The fix is to follow the workaround in this report. Execute the two commands as root.
echo "kernel.apparmor_restrict_unprivileged_userns = 0" >/etc/sysctl.d/99-userns.conf
sysctl --system

Then the failed apptainer command in the error log should now print the host name when executed with normal user privilege.

Note that this effectively reverts the tightened user namespace setting in Ubuntu 24.04. For anyone who knows apparmor configs better (not me), there will likely be a more restricted approach to only give apptainer the permission.
ID: 50253 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : ATLAS application : atlas error


©2024 CERN