Message boards :
ATLAS application :
Bad WUs?
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,460,948 RAC: 2,322 |
for the last few hours, all my ATLAS tasks on all of my computers are failing. They keep running, but no CPU usage, and VM console_2 shows N/A for each core. Guest Log: Checking CVMFS... and no response maeax had the same problem some time ago and his problem was proxy settings, if I remember correctly. |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,481,494 RAC: 30,699 |
Guest Log: Checking CVMFS... and no responsehm, I am not using a proxy. Furthermore, I am just realizing that on other computers on which I run Theory tasks, the ones which were downloaded within the past few hours, don't run either - also there: "Guest Log: Probing /cvmfs/sft.cern.ch... Failed!" https://lhcathome.cern.ch/lhcathome/result.php?resultid=364413464 I had not made any changes on any of my computers. So I guess the problem must be with CERN ??? |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,481,494 RAC: 30,699 |
here is the stderr from another machine: https://lhcathome.cern.ch/lhcathome/result.php?resultid=364441059 basically same error, same problem. And it looks exactly same on all other computers :-( As said before, I had not made any changes at all. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,840,462 RAC: 37,516 |
The task logs don't show a specific error condition, except this: 06:36:48.307434 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={85632c68-b5bb-4316-a900-5eb28d3413df} aComponent={MachineWrap} aText={Runtime error opening 'Z:\BOINC\slots\2\boinc_5f3d7e10453d75c7\boinc_5f3d7e10453d75c7.vbox' for reading: -103 (Path not found.). This tells you that vbox expects to read the VM definition file but either the file or a path component doesn't exist. Shouldn't be responsible for errors on other computers. Just bypassed my local proxy and tested grid.cern.ch and sft.cern.ch via s1cern-cvmfs.openhtc.io. Both immediately return "HTTP/1.1 200 OK". If you see network connection problems to CVMFS you may check your router/LAN. Did your ISP recently reset the connection? BTW: Your host reports vbox 6.1.18. You may update it to the most recent version, then reboot and check/clean the vbox environment (including the disk entries) before you start BOINC. |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,481,494 RAC: 30,699 |
If you see network connection problems to CVMFS you may check your router/LAN.No idea whether my ISP recently reset the connection. I did not notice something like this. Fact is that until tomorrow morning all LHC VM tasks (ATLAS and Theory) were running okay. I now rebooted the ISP modem/router which did, unfortunately, NOT solve the problem. I looked at the VM console during the start of a Theory task, and the process came to a halt at: (1 of 4) A start job is runnig for /cvmfs/cernvm-prod.cern.ch this ran in a loop for a few minutes, then it jumped to where it says "Probing /cvmfs/sft.cern.ch... Failed!" Since all of my computers connect without any problem to anywhere, and also, for example, the QuChemPad VM tasks run properly, I am questioning what is that special with CVMFS ? What is the modem/router NOT doing well with a CVMFS connection, whereas everything else works? I will update to the latest version the VM on one of my machines and then try, but my instinct tells me that this will not solve the problem :-( It this is so, what else could I do in order to crunch LHC ? |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 2,013 |
for the last few hours, all my ATLAS tasks on all of my computers are failing. They keep running, but no CPU usage, and VM console_2 shows N/A for each core. Yes, CentOS8-VM squid not running well with Threadripper 3995. Today made a new test with squid. Two Atlas running 5 hours and one running 11 hours for nothing. So. stopped squid. Have no idea why?https://lhcathome.cern.ch/lhcathome/results.php?userid=75468&offset=0&show_names=0&state=6&appid=14 |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,840,462 RAC: 37,516 |
The problems from a while ago had nothing to do with squid. The were cause by an undefined service order during ATLAS startup. |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 2,013 |
Hypervisor System Log: 468:39:42.807656 WARNING [COM]: aRC=E_FAIL (0x80004005) aIID={16ced992-5fdc-4aba-aff5-6a39bbd7c38b} aComponent={HostWrap} aText={Could not load the Host USB Proxy Service (VERR_FILE_NOT_FOUND). The service might not be installed on the host computer}, preserve=true aResultDetail=0 468:39:43.017821 Saving settings file "C:\ProgramData\BOINC\slots\3\boinc_10feba5151116813\boinc_10feba5151116813.vbox" with version "1.16-windows" ?? |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,481,494 RAC: 30,699 |
I now installed the latest VB version 6.1.36 - and, as expected, my problem of not connecting to CVMFS still exists.If you see network connection problems to CVMFS you may check your router/LAN.No idea whether my ISP recently reset the connection. I did not notice something like this. Honestly, I now have no idea what else I can do Does this now mean that I will no longer be able to crunch LHC VB tasks? :-) |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 2,013 |
Are there some events in Windows-Protocol for your PC's? |
Send message Joined: 27 Sep 04 Posts: 104 Credit: 8,104,901 RAC: 1,605 |
May or may not apply to you, but I had similar problems a while ago At the suggestion of someone (don't remember if it was here or my ISP Help Desk), i connected my cruncher machine via copper rather than WIFI, even though I was using 802.11AC All of my LLHC/Atlas connection problems went away |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,481,494 RAC: 30,699 |
May or may not apply to you, but I had similar problems a while agosome of my machines are connected via copper, some others via WLAN. My problem exists with all of these machines :-( |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,481,494 RAC: 30,699 |
Are there some events in Windows-Protocol for your PC's?nothing special |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,481,494 RAC: 30,699 |
good news: all my computers are successfully crunching LHC CM tasks again. I just gave it a try this late morning, and everything worked fine. Access to cvmfs was not a problem. No idea what had happened last weekend. |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,481,494 RAC: 30,699 |
good news: all my computers are successfully crunching LHC CM tasks again.sorry for the typo. Should have read VM tasks, of course (I noticed the typo only now and could not edit my posting any more) |
Send message Joined: 21 Jun 10 Posts: 40 Credit: 11,317,020 RAC: 6,327 |
For the past two days, I haven't been able to get any ATLAS tasks to run on Windows boxes. I've tried two different Windows 11 computers and one Windows 10 computer. All had previously successfully completed multiple ATLAS tasks. On prior successful tasks, there is a message in the stderr.txt that says: 2022-08-27 17:09:53 (16204): Guest Log: 00:00:00.000603 main 5.2.32 r132073 started. Verbose level = 0 2022-08-27 17:09:59 (16204): Guest Log: CVMFS is ok Now I get a message that says: 2022-08-30 21:52:02 (16276): Guest Log: 00:00:00.002213 main 5.2.32 r132073 started. Verbose level = 0 2022-08-30 21:52:12 (16276): Guest Log: 00:00:10.017953 timesync vgsvcTimeSyncWorker: Radical guest time change: 18 010 622 070 000ns (GuestNow=1 661 914 332 064 570 000 ns GuestLast=1 661 896 321 442 500 000 ns fSetTimeLastLoop=true ) And then it just sits there and does nothing. Any suggestions? |
Send message Joined: 2 May 07 Posts: 2243 Credit: 173,902,375 RAC: 2,013 |
This night have also three Atlas running for 6 hours under Win11pro and doing nothing. This is, when CVMFS is not connected correct, whatever the reason is. It is not at our side. |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,481,494 RAC: 30,699 |
This is, when CVMFS is not connected correct, whatever the reason is.this seems to happen once in a while - see my postings from a few days ago :-( CVMFS seems to be extremely touchy. |
Send message Joined: 21 Jun 10 Posts: 40 Credit: 11,317,020 RAC: 6,327 |
After reading through several other posts about this kind of problem, I thought I might have network problems. I shut down all computers, the switch and finally the router. Then restarted the router, then the switch, then computers. Atlas tasks work again. What had me confused was that all other internet connections worked just fine so I didn't think it was a network issue. But a reboot got ATLAS tasks going again. Hope this helps someone else having similar issues. |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,481,494 RAC: 30,699 |
After reading through several other posts about this kind of problem, I thought I might have network problems. I shut down all computers, the switch and finally the router. Then restarted the router, then the switch, then computers. Atlas tasks work again. What had me confused was that all other internet connections worked just fine so I didn't think it was a network issue. But a reboot got ATLAS tasks going again. Hope this helps someone else having similar issues.well, the situation was similar here, about a week ago. Everything else worked fine, just die LHC VM tasks which need access to CVMFS did not work. However, a reboot of the modem/router did NOT help. Only 2 days later everything worked fine, without any further intervention from my side. |
©2024 CERN