Questions and Answers :
Unix/Linux :
computing errors
Message board moderation
Author | Message |
---|---|
Send message Joined: 26 Nov 11 Posts: 3 Credit: 104,527 RAC: 0 |
Intended to start crunching lhc@home but i get computation errors on every task - last tested cms im running: - ubuntu 22.04 5.15.0-72-lowlatency, - boinc 7.23.0 build off git master branch (works fine everywhere else= - virtualbox 7.0.8 r156879 (Qt5.15.3), guest additions updated. (may missed a reboot after guest additions updated but this should be an issue?) .. tested with firewall off behind default nat. Machine is amd ryzen 1920x on Asus ROG Strix X399 lscpu prints Virtualization: AMD-V Ram doesn't seem to be an issue, added 100gb swap, neither free space 400gb + free Last test cms tasks fail at ~1% with computation error What could be the issue here? |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 456 |
2023-05-28 02:14:49 (3326477): Guest Log: NCAT DEBUG: Unable to load trusted CA certificates from /usr/share/ncat/ca-bundle.crt: error:02001002:system library:fopen:No such file or directory Boinc have a conflict with CA Certificate. See Boinc-Webpage. You can upgrade Boinc to 7.20.2. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
Looks like most of your network packets get lost. Do some basic tests. Example: nc -zvvw10 cern.ch 80 Result must be: Connection to cern.ch 80 port [tcp/http] succeeded! Other tests that must succeed: nc -zvvw10 vccs.cern.ch 443 nc -zvvw10 vocms0840.cern.ch 9618 nc -zvvw10 vocms0267.cern.ch 4080 nc -zvvw10 eoscms-ns-ip563.cern.ch 1094 nc -zvvw10 vocms0205.cern.ch 80 nc -zvvw10 cmsfrontier.cern.ch 8000 nc -zvvw10 cms-frontier.openhtc.io 8080 If only one of them fails you need to investigate whether it's caused by the firewall on the computer or on the router. |
Send message Joined: 26 Nov 11 Posts: 3 Credit: 104,527 RAC: 0 |
THX for the replies! Missed out to open port 8000 - CMS test task is at 7% now, way further than the previous ones which had the computation error at ~1% Curious it didn't work when i tested with disabled ufw - maybe it wasn't reloaded properly? here are the updated ufw rules, ive added now beside the default 80, 443 and 53 sudo ufw allow out 9618/tcp # boinc lhc Theory and CMS and LHCb: sudo ufw allow out 9094/tcp # boinc lhc ATLAS sudo ufw allow out 5222/tcp # boinc lhc xmpp ATLAS sudo ufw allow out 3125/tcp # boinc lhc CVMFS sudo ufw allow out 4080/tcp # boinc lhc WMAgent sudo ufw allow out 8000/tcp # boinc lhc HTTP sudo ufw allow out 8080/tcp # boinc lhc HTTP sudo ufw allow out 8443/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9133/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9135/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9148/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9149/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9166/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9196/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9199/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 1094/tcp # boinc lhc CMS EOS sudo ufw reload $ nc -zvvw10 cern.ch 80 nc -zvvw10 vccs.cern.ch 443 nc -zvvw10 vocms0840.cern.ch 9618 nc -zvvw10 vocms0267.cern.ch 4080 nc -zvvw10 eoscms-ns-ip563.cern.ch 1094 nc -zvvw10 vocms0205.cern.ch 80 nc -zvvw10 cmsfrontier.cern.ch 8000 nc -zvvw10 cms-frontier.openhtc.io 8080 Connection to cern.ch (188.184.37.219) 80 port [tcp/http] succeeded! Connection to vccs.cern.ch (137.138.120.99) 443 port [tcp/https] succeeded! Connection to vocms0840.cern.ch (137.138.156.85) 9618 port [tcp/*] succeeded! Connection to vocms0267.cern.ch (137.138.52.94) 4080 port [tcp/*] succeeded! Connection to eoscms-ns-ip563.cern.ch (128.142.160.140) 1094 port [tcp/rootd] succeeded! Connection to vocms0205.cern.ch (137.138.55.253) 80 port [tcp/http] succeeded! Connection to cmsfrontier.cern.ch (188.184.100.32) 8000 port [tcp/*] succeeded! Connection to cms-frontier.openhtc.io (188.114.97.3) 8080 port [tcp/http-alt] succeeded! fingers crossed .. ill update later if the CMS task completed successfully. upgraded boinc too but haven't restarted yet, wasnt aware of the certificate bug .. primegrid had no obv issues. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
You never had a BOINC certificate bug. The issue was inside your VM which could not contact CERN to update it's CA certs. BOINC's progress bar/percentage doesn't tell you what you need to know since it can't look into the VM. To get an early impression whether the task processes fine you may check the messages in .../slots/n/stderr.txt with n being the slot number. You may also use the "show" function of the VirtualBox GUI to get to the VM's consoles. Leave that view using "Machine -> disconnect from GUI". Other methods interrupt the VM and cause unnecessary load. |
Send message Joined: 26 Nov 11 Posts: 3 Credit: 104,527 RAC: 0 |
You never had a BOINC certificate bug. - Yeah guessed that about the boinc "upgrade" , which would have been a downgrade as im on 7.23.0 and suggested was 7.20.2 in this post. i compile the master branch off github and never run into issues doing so ... up to date with master as of now. ... Had 11 valid cms tasks meanwhile. When i tried theory simulation i had some computation errors - have to check that later .. for now running cms. My ufw config rn running primegrid and lhc: sudo ufw allow out to any port 80 sudo ufw allow out to any port 443 sudo ufw allow out to any port 53 sudo ufw allow out 31416/tcp # boinc sudo ufw allow out 38406/tcp # boinc sudo ufw allow out 46082/tcp # boinc sudo ufw allow out 9618/tcp # boinc lhc Theory and CMS and LHCb: sudo ufw allow out 9094/tcp # boinc lhc ATLAS sudo ufw allow out 5222/tcp # boinc lhc xmpp ATLAS sudo ufw allow out 3125/tcp # boinc lhc CVMFS sudo ufw allow out 4080/tcp # boinc lhc WMAgent sudo ufw allow out 8000/tcp # boinc lhc HTTP sudo ufw allow out 8080/tcp # boinc lhc HTTP sudo ufw allow out 8443/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9133/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9135/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9148/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9149/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9166/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9196/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 9199/tcp # boinc lhc LHCb DIRAC sudo ufw allow out 1094/tcp # boinc lhc CMS EOS To get an early impression whether the task processes fine you may check the messages in .../slots/n/stderr.txt with n being the slot number. - maybe useful later on, thx You may also use the "show" function of the VirtualBox GUI to get to the VM's consoles. going to try this out, thx again.[/quote] |
Send message Joined: 14 Jul 05 Posts: 1 Credit: 528,610 RAC: 0 |
I am new to using Linux. LHC fails after 10 minutes. The latest was : 394537242 211814107 10830515 4 Jun 2023, 12:29:35 UTC 4 Jun 2023, 13:58:59 UTC Error while computing 602.86 0.00 --- ATLAS Simulation v3.01 (native_mt) x86_64-pc-linux-gnu Using Debian 11, AND Ryzen 5600g, 16 G ram. The tests you suggest fail. charles@Guardian:~$ nc -zvvw10 cern.ch 80 DNS fwd/rev mismatch: cern.ch != drupal8lb01.cern.ch cern.ch [188.184.37.219] 80 (http) open sent 0, rcvd 0 charles@Guardian:~$ nc -zvvw10 vccs.cern.ch 443 vccs.cern.ch [137.138.120.99] 443 (https) open sent 0, rcvd 0 |
©2024 CERN