Message boards : ATLAS application : Most error that I have encountered
Message board moderation

To post messages, you must log in.

AuthorMessage
Beom Sik Shin

Send message
Joined: 28 Nov 09
Posts: 3
Credit: 211,168
RAC: 76
Message 47995 - Posted: 11 Apr 2023, 18:40:23 UTC
Last modified: 11 Apr 2023, 19:30:51 UTC

I've dug the rabbit hole and found some problems and a temporary workaround.

SSL error with CERN CA Certificate

2023-04-11 17:36:19,442 | WARNING  | failed to load data from url=https://atlas-cric.cern.ch/cache/ddmendpoints.json, error: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1129)> .. trying to use data from cache=/var/lib/boinc-client/slots/0/agis_ddmendpoints.agis.ALL.json


I'm not sure what it is but somehow this slows down the first initial running time for retrying. If you try to visit the site it will throw a warning about certification. CERN CA is not an approved certificate somehow. Based on that, I have added CERN CA (Grid, Root) https://ca.cern.ch/cafiles/ with

update-ca-certificates

And, still happening. However, it will work by doing
wget https://atlas-cric.cern.ch/cache/ddmendpoints.json -O agis_ddmendpoints.agis.ALL.json

this is for every slot, modify the job file a little bit.

Does anyone know how to solve this more cleanly? (or is this not necessary?)

CoolDB connection with IPv6, and WSL

Initially, I was trying to do it in WSL2 (and it was almost possible but lack of IPv6 support make it fail). Seeing log.EVNTtoHITS, there was a guaranteed failure, and it was the attempt to connect some COOLDB/TF200? There was a detailed failure log that shows the attempt with an IPv6 address.

I tried to disable the IPv6 on Linux wise, assuming there might be an IPv4 address for fallback. But It wasn't and adding to that, I don't have an IPv6 address to connect.

Hyper-V with IPv6

I really need my WSL functionality available, even if I use the Vbox version, it cannot be resolved due to the no-IPv6 situation. Fortunately, there is a tunneling solution for IPv4 to IPv6 (6in4). https://handwiki.org/wiki/List_of_IPv6_tunnel_brokers Hyper-V supports IPv6, and performance degradation is fairly tolerable, in my opinion.

Setting these up. Watch out for library names. For example, libwxgtk... name has changed to libwxgtk3.0-gtk3-dev since Ubuntu 20. If you want to speed up these stages, add -j (core number) for every make command call.
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4840
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5788
https://cvmfs.readthedocs.io/en/stable/cpt-squid.html
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4758

I'm not sure whether I cleared all possible errors, however, tasks seem running stable for now. I hope this help someone who really wants to join into LHC@Home.

ID: 47995 · Report as offensive     Reply Quote
Toggleton

Send message
Joined: 4 Mar 17
Posts: 20
Credit: 8,105,053
RAC: 7,747
Message 47998 - Posted: 12 Apr 2023, 12:08:23 UTC - in response to Message 47995.  

https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=614&postid=7968 The certificate error was in the test server already. it is something new with the Run 3 tasks since yesterday. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5978&postid=47994

I have the same. https://lhcathome.cern.ch/lhcathome/result.php?resultid=391664886

We will see when they fix it. But so far does it not hurt much. Only a bit longer idle time.
I was told that will likely run out of Run 2 simulation tasks to run on the prod project very soon, so I have gone ahead and released version 3 there so we can start running Run 3 tasks. Unfortunately I don't think we'll be able to resolve some of the remaining issues like the console monitoring before going live on prod but I think it's better to have something not quite perfect than no tasks at all.
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=614&postid=8048
ID: 47998 · Report as offensive     Reply Quote
Beom Sik Shin

Send message
Joined: 28 Nov 09
Posts: 3
Credit: 211,168
RAC: 76
Message 48003 - Posted: 14 Apr 2023, 2:32:08 UTC - in response to Message 47998.  
Last modified: 14 Apr 2023, 3:03:44 UTC

I see. It is a bit awkward to fix by forcing participants to install CERN CA certificates. In that sense, it is understandable to me. As I said above, This can be fixed by adding a CERN CA to the machine and modifying job.xml and something like this.

// ../projects/lhcathome.cern.ch_lhcathome/ATLAS_job_2.54_x86_64-pc-linux-gnu.xml
<job_desc>
  <task>
    <application>/usr/bin/wget</application>
    <command_line>https://atlas-cric.cern.ch/cache/ddmendpoints.json -O agis_ddmendpoints.agis.ALL.json</command_line>
  </task>
  <task>
    <application>run_atlas</application>
    <command_line>--nthreads $NTHREADS</command_line>
    <checkpoint_filename>boinc_mmap_file</checkpoint_filename>
  </task>
</job_desc>

It's a hacky way to fix and it will be reset if the machine restarts. But, yeah. If one wants to optimize idle time, this will work.
ID: 48003 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,559,707
RAC: 121,159
Message 48045 - Posted: 27 Apr 2023, 14:53:18 UTC - in response to Message 48003.  

This kind of modification is clearly not recommended.
Especially since it "saves" just a few seconds at the beginning of a task until the inner ATLAS script switches to a fallback source. At the end of each task the same situation happens again, although "agis_ddmendpoints.agis.ALL.json" is now present.

Beside that the real issue is not the self signed CRIC certificate.
Instead its the missing CERN CA certificate the CRIC certificate is checked against.
That CA certificate needs to be present inside the ATLAS container as the container doesn't use CA certificates from the main host source.


The necessary modifications have to be done by the ATLAS developers at CERN.
There are a few possible solutions:

1. Use and distribute an ATLAS container that includes an up to date CERN CA certificate
2. Run a script inside the container that checks for required certificates and gets/updates them from a reliable source, e.g. CVMFS.
3. Don't use CRIC at all. Instead configure the tasks to get the objects from CVMFS.
ID: 48045 · Report as offensive     Reply Quote
Beom Sik Shin

Send message
Joined: 28 Nov 09
Posts: 3
Credit: 211,168
RAC: 76
Message 48328 - Posted: 1 Aug 2023, 7:30:08 UTC - in response to Message 48045.  

Yep, you are right, and agree with your argument, maybe I should have add precaution or warning. It is minor changes but important step due to the security reasons.

Second way seems good way, in my opinion, using the advantage of using VM, make it hard to do the DNS poisoning, keep the security check intact. but, it's up to CERN developers.
ID: 48328 · Report as offensive     Reply Quote

Message boards : ATLAS application : Most error that I have encountered


©2024 CERN