Questions and Answers : Windows : Windows vbox64 CMS Simulation tasks failing - VM unable to validate X509 credential from LHC@home
Message board moderation

To post messages, you must log in.

AuthorMessage
skydivingnerd

Send message
Joined: 8 Apr 21
Posts: 23
Credit: 31,874,306
RAC: 56,291
Message 44940 - Posted: 12 May 2021, 17:02:57 UTC
Last modified: 12 May 2021, 17:11:07 UTC

I have a Win10 machine with BOINC client 7.16.11 and Vbox 6.1.22 installed. All the CMS Simulation tasks on my host are failing when the VM attemtps to validate the x509 certificate with LHC@home. I installed the CERN Root and Grid CA certificates, https://cafiles.cern.ch/cafiles/, on my local host, seeing if that corrected the issue of validation. It did not.

Failed jobs examples:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=316190883
https://lhcathome.cern.ch/lhcathome/result.php?resultid=316187982
https://lhcathome.cern.ch/lhcathome/result.php?resultid=316180651

I've verified the local windows FW as well as my pfSense FW, including Snort, is passing traffic as it should.

I ran a packet capture while the VM was attempting to reach out for the validation and see that the VM is communicating with LHC servers (vccs.cern.ch @ 137.138.120.99). The VM does not recognize the CERN server side CA. The stream exits with a TLSv1.2 Fatal error: Unknown CA

The relevant packet is #10

No.     Time           Source                Destination           Protocol Length Info
      1 0.000000       192.168.150.30        137.138.120.99        TCP      66     55514 → 443 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM=1

Frame 1: 66 bytes on wire (528 bits), 66 bytes captured (528 bits)
Ethernet II, Src: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09), Dst: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10)
Internet Protocol Version 4, Src: 192.168.150.30, Dst: 137.138.120.99
Transmission Control Protocol, Src Port: 55514, Dst Port: 443, Seq: 0, Len: 0

No.     Time           Source                Destination           Protocol Length Info
      2 0.108285       137.138.120.99        192.168.150.30        TCP      66     443 → 55514 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1 WS=128

Frame 2: 66 bytes on wire (528 bits), 66 bytes captured (528 bits)
Ethernet II, Src: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10), Dst: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09)
Internet Protocol Version 4, Src: 137.138.120.99, Dst: 192.168.150.30
Transmission Control Protocol, Src Port: 443, Dst Port: 55514, Seq: 0, Ack: 1, Len: 0

No.     Time           Source                Destination           Protocol Length Info
      3 0.108513       192.168.150.30        137.138.120.99        TCP      60     55514 → 443 [ACK] Seq=1 Ack=1 Win=262656 Len=0

Frame 3: 60 bytes on wire (480 bits), 60 bytes captured (480 bits)
Ethernet II, Src: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09), Dst: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10)
Internet Protocol Version 4, Src: 192.168.150.30, Dst: 137.138.120.99
Transmission Control Protocol, Src Port: 55514, Dst Port: 443, Seq: 1, Ack: 1, Len: 0

No.     Time           Source                Destination           Protocol Length Info
      4 0.186955       192.168.150.30        137.138.120.99        TLSv1.2  224    Client Hello

Frame 4: 224 bytes on wire (1792 bits), 224 bytes captured (1792 bits)
Ethernet II, Src: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09), Dst: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10)
Internet Protocol Version 4, Src: 192.168.150.30, Dst: 137.138.120.99
Transmission Control Protocol, Src Port: 55514, Dst Port: 443, Seq: 1, Ack: 1, Len: 170
Secure Sockets Layer

No.     Time           Source                Destination           Protocol Length Info
      5 0.297779       137.138.120.99        192.168.150.30        TCP      54     443 → 55514 [ACK] Seq=1 Ack=171 Win=30336 Len=0

Frame 5: 54 bytes on wire (432 bits), 54 bytes captured (432 bits)
Ethernet II, Src: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10), Dst: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09)
Internet Protocol Version 4, Src: 137.138.120.99, Dst: 192.168.150.30
Transmission Control Protocol, Src Port: 443, Dst Port: 55514, Seq: 1, Ack: 171, Len: 0

No.     Time           Source                Destination           Protocol Length Info
      6 0.306888       137.138.120.99        192.168.150.30        TLSv1.2  1514   Server Hello

Frame 6: 1514 bytes on wire (12112 bits), 1514 bytes captured (12112 bits)
Ethernet II, Src: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10), Dst: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09)
Internet Protocol Version 4, Src: 137.138.120.99, Dst: 192.168.150.30
Transmission Control Protocol, Src Port: 443, Dst Port: 55514, Seq: 1, Ack: 171, Len: 1460
Secure Sockets Layer

No.     Time           Source                Destination           Protocol Length Info
      7 0.306897       137.138.120.99        192.168.150.30        TLSv1.2  1514   Certificate [TCP segment of a reassembled PDU]

Frame 7: 1514 bytes on wire (12112 bits), 1514 bytes captured (12112 bits)
Ethernet II, Src: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10), Dst: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09)
Internet Protocol Version 4, Src: 137.138.120.99, Dst: 192.168.150.30
Transmission Control Protocol, Src Port: 443, Dst Port: 55514, Seq: 1461, Ack: 171, Len: 1460
[2 Reassembled TCP Segments (2315 bytes): #6(1366), #7(949)]
Secure Sockets Layer

No.     Time           Source                Destination           Protocol Length Info
      8 0.306905       137.138.120.99        192.168.150.30        TLSv1.2  146    Server Key Exchange, Server Hello Done

Frame 8: 146 bytes on wire (1168 bits), 146 bytes captured (1168 bits)
Ethernet II, Src: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10), Dst: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09)
Internet Protocol Version 4, Src: 137.138.120.99, Dst: 192.168.150.30
Transmission Control Protocol, Src Port: 443, Dst Port: 55514, Seq: 2921, Ack: 171, Len: 92
[2 Reassembled TCP Segments (594 bytes): #7(511), #8(83)]
Secure Sockets Layer
Secure Sockets Layer

No.     Time           Source                Destination           Protocol Length Info
      9 0.307078       192.168.150.30        137.138.120.99        TCP      60     55514 → 443 [ACK] Seq=171 Ack=3013 Win=262656 Len=0

Frame 9: 60 bytes on wire (480 bits), 60 bytes captured (480 bits)
Ethernet II, Src: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09), Dst: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10)
Internet Protocol Version 4, Src: 192.168.150.30, Dst: 137.138.120.99
Transmission Control Protocol, Src Port: 55514, Dst Port: 443, Seq: 171, Ack: 3013, Len: 0

No.     Time           Source                Destination           Protocol Length Info
     10 0.308588       192.168.150.30        137.138.120.99        TLSv1.2  61     Alert (Level: Fatal, Description: Unknown CA)

Frame 10: 61 bytes on wire (488 bits), 61 bytes captured (488 bits)
Ethernet II, Src: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09), Dst: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10)
Internet Protocol Version 4, Src: 192.168.150.30, Dst: 137.138.120.99
Transmission Control Protocol, Src Port: 55514, Dst Port: 443, Seq: 171, Ack: 3013, Len: 7
Secure Sockets Layer

No.     Time           Source                Destination           Protocol Length Info
     11 0.308688       192.168.150.30        137.138.120.99        TCP      60     55514 → 443 [FIN, ACK] Seq=178 Ack=3013 Win=262656 Len=0

Frame 11: 60 bytes on wire (480 bits), 60 bytes captured (480 bits)
Ethernet II, Src: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09), Dst: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10)
Internet Protocol Version 4, Src: 192.168.150.30, Dst: 137.138.120.99
Transmission Control Protocol, Src Port: 55514, Dst Port: 443, Seq: 178, Ack: 3013, Len: 0

No.     Time           Source                Destination           Protocol Length Info
     12 0.418915       137.138.120.99        192.168.150.30        TCP      54     443 → 55514 [FIN, ACK] Seq=3013 Ack=179 Win=30336 Len=0

Frame 12: 54 bytes on wire (432 bits), 54 bytes captured (432 bits)
Ethernet II, Src: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10), Dst: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09)
Internet Protocol Version 4, Src: 137.138.120.99, Dst: 192.168.150.30
Transmission Control Protocol, Src Port: 443, Dst Port: 55514, Seq: 3013, Ack: 179, Len: 0

No.     Time           Source                Destination           Protocol Length Info
     13 0.419178       192.168.150.30        137.138.120.99        TCP      60     55514 → 443 [ACK] Seq=179 Ack=3014 Win=262656 Len=0

Frame 13: 60 bytes on wire (480 bits), 60 bytes captured (480 bits)
Ethernet II, Src: AsustekC_ee:47:09 (3c:7c:3f:ee:47:09), Dst: IntelCor_6b:d4:10 (00:1b:21:6b:d4:10)
Internet Protocol Version 4, Src: 192.168.150.30, Dst: 137.138.120.99
Transmission Control Protocol, Src Port: 55514, Dst Port: 443, Seq: 179, Ack: 3014, Len: 0



I believe this is an issue with the VM itself not having the correct host certificate. Can an admin check into this?

R/S
Scott
ID: 44940 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,998,564
RAC: 136,215
Message 44941 - Posted: 12 May 2021, 17:28:14 UTC - in response to Message 44940.  
Last modified: 12 May 2021, 17:51:22 UTC

I installed the CERN Root ... on my local host, seeing if that corrected the issue of validation. It did not.

Sure.
This will not work as the certs need to be installed inside the VM.
To may test Theory vbox on that computer to see whether it behaves different.


<edit>
This is a snippet from one of your CMS logs:
2021-05-07 19:49:02 (5344): Guest Log: [DEBUG] Probing CVMFS ...

2021-05-07 19:49:02 (5344): Guest Log: Probing /cvmfs/grid.cern.ch... OK

2021-05-07 19:49:07 (5344): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE

2021-05-07 19:49:07 (5344): Guest Log: 2.4.4.0 3755 5 25524 11625 2 1 1234377 4096000 2 65024 0 3 100 0 0 http://s1asgc-cvmfs.openhtc.io:8080/cvmfs/grid.cern.ch http://131.225.188.246:3126 0

2021-05-07 19:53:47 (5344): Guest Log: [INFO] Reading volunteer information

2021-05-07 19:53:47 (5344): Guest Log: [INFO] Volunteer: scotth (787857)

It looks as if you either use a local proxy that is not correctly configured.
=> CVMFS configures a fallback proxy.

Or the CVMFS inside the VM can only partly access CERN's CVMFS.
Since some CA certs are taken from there your cert issues are follow up issues.

The latter mostly point out an incomplete local firewall setup.
I guess it's on the affected computer since your native CVMFS on others are working fine.
Looks like you are familiar with network diagnostic tools.
If so you may check for filtered TCP packets to ports 80, 8000, 8080, 443, 4080, 9618.
</edit>
ID: 44941 · Report as offensive     Reply Quote
skydivingnerd

Send message
Joined: 8 Apr 21
Posts: 23
Credit: 31,874,306
RAC: 56,291
Message 44943 - Posted: 13 May 2021, 11:32:35 UTC - in response to Message 44941.  
Last modified: 13 May 2021, 11:36:01 UTC


This will not work as the certs need to be installed inside the VM.
To may test Theory vbox on that computer to see whether it behaves different.

I didn't think it really would... but gave it a shot anyway just to be sure for myself.


It looks as if you either use a local proxy that is not correctly configured.
=> CVMFS configures a fallback proxy.

I don't have a local Squid proxy configured on, or for, my hosts. All my other hosts (except one, which will be getting an OS
rebuild soon) running native work units are reaching out for their images. My ISP connection handles the traffic easily. I'm
just working on getting all my hosts running correctly, then will be configuring Squid proxy on my firewall and then making config
changes on each host. Then working out any issues on that...


Or the CVMFS inside the VM can only partly access CERN's CVMFS.
Since some CA certs are taken from there your cert issues are follow up issues.

The latter mostly point out an incomplete local firewall setup.

Below is one of the many links I found when I was initially setting up LHC@Home and getting native work units to run correctly.
I've configured a port alias in pfSense to handle it all, with the exception of my existing rules for port 80 and 443. The FW rule
allowing all the traffic is configured for TCP only vice TCP/UDP.

https://lhcathome.web.cern.ch/test4theory/my-firewall-complaining-which-ports-does-project-use

Here is my port list:
3125		Common - CVMFS
8000		ATLAS - HTTP
8080		ATLAS - HTTP
23128		ATLAS - HTTP
3127:3128	ATLAS - HTTP Proxy
5222		ATLAS - XMPP
9094		ATLAS - TCP
9618		Theory, CMS, LHCb - Condor
4080		CMS - WMAgent
8080		CMS - Frontier
8443		LHCb - DIRAC
9133:9149	LHCb - DIRAC
9166		LHCb - DIRAC
9196:9199	LHCb - DIRAC


I've also been chewing through my Snort logs the past several weeks, identifying and suppressing signature alerts for
LHC@Home traffic. I've got a nice list of IP addresses LHC@home communicates with. A few of what I believe are the
more critical CVMFS IP addresses I've added to an "External Server" alias list and configured that on the Snort Pass list
to prevent any alerting on those.
Here are the CVMFS entries I have in the alias:
104.21.88.130	  LHC@Home - s1f'nal/bnl/unl/cern/ral'-cvmfs.openhtc.io
172.67.179.99	  LHC@Home - s1f'nal/bnl/unl/cern/ral'-cvmfs.openhtc.io
158.39.48.38	  LHC@Home - atlas-db-squid1.grid.uiocloud.no

I'm still stuck on the response I saw in the packet capture from the LHC@Home CMS Simulation VM. It actively rejected the
server side Certificate Authority as invalid. I still believe this is a LHC server side issue unless someone can validate that I'm
the only one with this issue.

R/S
Scott
ID: 44943 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,998,564
RAC: 136,215
Message 44944 - Posted: 13 May 2021, 12:54:57 UTC - in response to Message 44943.  

The project's firewall list might need to be updated.
It shows ports/projects that are not in use any more and others are missing:


Not in use:
Port 3127
Port 3125 (replaced by port 3126 and used by fallback proxies)
Port 5222 (XMPP)
Port 9094
Port 1094

LHCb (all DIRAC ports)



When a fresh VM starts it's internal CVMFS is not yet completely configured.
Instead it checks some hard wired servers to get updated basic setup scripts from.
As a result the packets to the required destination ports should not be restricted to a few IPs, they should be allowed for all destination IPs.

Same for openhtc.io.
Those servers are run by Cloudflare and usually don't change the IP very often, but sometimes they do it a couple of times within just a day.
In addition the list does not include cms-frontier.openhtc.io which is required for CMS tasks (wouldn't be necessary if all destinations are allowed).



... My ISP connection handles the traffic easily ...

An argument that is used very often by many volunteers.
The point is "My ISP...".
The CVMFS manual clearly asks for a local proxy to keep the load on the project servers as low as possible.



... then will be configuring Squid proxy ...

Good idea.
The sooner the better.
ID: 44944 · Report as offensive     Reply Quote
skydivingnerd

Send message
Joined: 8 Apr 21
Posts: 23
Credit: 31,874,306
RAC: 56,291
Message 44945 - Posted: 13 May 2021, 19:34:23 UTC - in response to Message 44944.  

Great news on identifying the firewall port page needs updating.

I now have three completed CMS Simulation tasks for my Win10 host!
https://lhcathome.cern.ch/lhcathome/result.php?resultid=316425963
https://lhcathome.cern.ch/lhcathome/result.php?resultid=316423089
https://lhcathome.cern.ch/lhcathome/result.php?resultid=316428800

The only modification I've made since your previous post was in adding port 3126 to my
rule allowing it out. I saw that in the error log of one of my failed work units when you
quoted it back in my post.

I have not made any additional changes on that. Was the CMS Simulation VM updated?


Additionally, from your info on the port usage

Not in use:
Port 3127
Port 3125 (replaced by port 3126 and used by fallback proxies)
Port 5222 (XMPP)
Port 9094
Port 1094

LHCb (all DIRAC ports)

I'll remove those from my allowed outbound ports for the LHC@Home traffic.


Speaking to your comment here:

As a result the packets to the required destination ports should not be restricted
to a few IPs, they should be allowed for all destination IPs.

I've always had my FW rule configured to allow the identified ports out to any
IP. I specifically added the CVMFS IPs I found to my Snort PASS list to ensure
any of them did not get blocked by a signature hit.

Now that the host is working, yes, I will be looking up the Squid configuration
and setting it up in pfSense to get my clients from reaching all the way out.

Thank you!

R/S
Scott
ID: 44945 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,998,564
RAC: 136,215
Message 44948 - Posted: 16 May 2021, 11:03:20 UTC

I usually do not control other volunteers.
In this case I stumbled over an error that needs to be corrected manually.
If not it would treat you every now and then.

Here are 2 failed CMS tasks that show the same error message:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=316484150
https://lhcathome.cern.ch/lhcathome/result.php?resultid=316539335
VBoxManage.exe: error: Medium 'C:\ProgramData\BOINC\slots\18\vm_image.vdi' is not accessible. UUID {9f5af9d2-a067-43af-9905-e40303214595} of the medium 'C:\ProgramData\BOINC\slots\18\vm_image.vdi' does not match the value {82750195-a5d4-4cc4-8519-84a53d0783a4} stored in the media registry ('C:\Users\Scott\.VirtualBox\VirtualBox.xml')
.
.
.
2021-05-14 02:20:05 (2292): 
   NOTE: VM session lock error encountered.
 		    BOINC will be notified that it needs to clean up the environment.
 		    This might be a temporary problem and so this job will be rescheduled for another time.

Both point out an error in slots\18.
This slot needs to be cleaned.
You may
- shut down the Boinc client and wait until everything has calmed down
- remove everything below slots\18
- restart the BOINC client
ID: 44948 · Report as offensive     Reply Quote

Questions and Answers : Windows : Windows vbox64 CMS Simulation tasks failing - VM unable to validate X509 credential from LHC@home


©2024 CERN