Message boards :
Theory Application :
Probing /cvfms/grid.cern.ch... Failed!
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Sep 04 Posts: 707 Credit: 47,261,250 RAC: 28,770 |
I have started to get errors on cvfms like shown on title. The monitor shows next 'ERROR Could not source logging functions from /cvfms/grid.cern.ch/vc/bin/logging_functions.' CPU time stays at 0 but runtime counts on. These have to be manually aborted. Here's one:https://lhcathome.cern.ch/lhcathome/result.php?resultid=334519241 |
Send message Joined: 28 Sep 04 Posts: 707 Credit: 47,261,250 RAC: 28,770 |
Three errors today but now they are all for: 2021-12-06 22:09:42 (21084): Guest Log: Probing /cvmfs/cernvm-prod.cern.ch... Failed! 2021-12-06 22:09:42 (21084): Guest Log: 22:09:41 EET +02:00 2021-12-06: cranky: [ERROR] 'cvmfs_config probe cernvm-prod.cern.ch' failed. |
Send message Joined: 18 Dec 15 Posts: 1738 Credit: 114,887,926 RAC: 93,187 |
I have had numerous such cases in the last 10 days or so. 2021-12-13 23:10:07 (15164): Guest Log: Probing /cvmfs/grid.cern.ch... Failed! 2021-12-13 23:10:07 (15164): Guest Log: 23:10:09 CET +01:00 2021-12-13: cranky: [ERROR] 'cvmfs_config probe grid.cern.ch' failed. The bad thing is that once this happens, the WU does not stop automatically, thus wasting a slot until I happen to discover the problem and stop the WU manually. With having run several Theory tasks concurrently on a total of 8 computers, it is somewhat troublesome to find out quickly once this problem happens. |
Send message Joined: 18 Dec 15 Posts: 1738 Credit: 114,887,926 RAC: 93,187 |
in the past few days, I have had several of such failures. As I wrote before, the bad thing with those is that they don't stop automatically, but run (without CPU utilization) until the problem is being detected by chance. Why is it not possible to build in some kind of mechanism which makes the task stop and abort automatically in the case of such a failure? |
Send message Joined: 14 Jan 10 Posts: 1362 Credit: 9,108,524 RAC: 3,013 |
2022-09-30 18:02:23 (3380): Guest Log: 18:02:24 CEST +02:00 2022-09-30: cranky: [INFO] Checking CVMFS. 2022-09-30 18:02:25 (3380): Guest Log: Probing /cvmfs/sft.cern.ch... OK 2022-09-30 18:02:26 (3380): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2022-09-30 18:02:26 (3380): Guest Log: 2.5.2.0 4103 1 28244 23913 2 1 257119 4096000 0 65024 0 0 n/a 5 15 http://s1cern-cvmfs.openhtc.io/cvmfs/sft.cern.ch DIRECT 1 2022-09-30 18:02:28 (3380): Guest Log: Probing /cvmfs/grid.cern.ch... OK 2022-09-30 18:02:29 (3380): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2022-09-30 18:02:29 (3380): Guest Log: 2.5.2.0 4078 1 27548 19399 2 2 257119 4096000 0 65024 0 2 -100 7783 2376 http://s1bnl-cvmfs.openhtc.io/cvmfs/grid.cern.ch DIRECT 1 2022-09-30 18:02:31 (3380): Guest Log: Probing /cvmfs/cernvm-prod.cern.ch... OK 2022-09-30 18:02:32 (3380): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2022-09-30 18:02:32 (3380): Guest Log: 2.5.2.0 4051 1 27232 268 2 1 257119 4096000 0 65024 0 0 n/a 4 10 http://s1ral-cvmfs.openhtc.io/cvmfs/cernvm-prod.cern.ch DIRECT 1 2022-09-30 18:02:33 (3380): Guest Log: Probing /cvmfs/alice.cern.ch... Failed! 2022-09-30 18:02:33 (3380): Guest Log: 18:02:34 CEST +02:00 2022-09-30: cranky: [ERROR] 'cvmfs_config probe alice.cern.ch' failed. 2022-09-30 18:02:34 (3380): Guest Log: [ERROR] Job Failed 2022-09-30 18:02:34 (3380): Guest Log: [INFO] Shutting Down. Is Probing /cvmfs/alice.cern.ch... useful? |
Send message Joined: 15 Jun 08 Posts: 2490 Credit: 247,674,779 RAC: 123,454 |
Is Probing /cvmfs/alice.cern.ch... useful? So far alice is sometimes required for Theory tasks. Just tested to connect to it's main catalog via different cloudflare servers (wget, no proxy). All tests succeeded. "cvmfs_config probe alice" also succeeded. |
Send message Joined: 14 Jan 10 Posts: 1362 Credit: 9,108,524 RAC: 3,013 |
All tests succeeded.Must have been a network glitch. |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,866,264 RAC: 0 |
I've had a few of these recently (latest one was from -dev) There is only one attempt at probing and if that fails then the tasks idles until I intervene and Abort it, otherwise it will run doing nothing until the timeout. Might it be helpful to add further attempts to probe, after a delay(s), so that any transient network problem may have resolved itself. Then if the problem persists, after "n" attempts, to end the Task rather than waiting for the timeout or manual intervention? It doesn't happen often but it is annoying coming home from work to find one that has been idle for a day or more. |
Send message Joined: 2 May 07 Posts: 2184 Credit: 172,752,337 RAC: 39,114 |
374605387 200459726 23 Dec 2022, 18:44:16 UTC 24 Dec 2022, 9:02:08 UTC Abgebrochen 48,207.93 108.98 --- Theory Simulation v300.07 (vbox64_theory) windows_x86_64 374605404 200465087 23 Dec 2022, 18:44:09 UTC 24 Dec 2022, 9:02:08 UTC Abgebrochen 48,833.27 85.61 --- Theory Simulation v300.07 (vbox64_theory) windows_x86_64 374605406 200458911 23 Dec 2022, 18:44:09 UTC 24 Dec 2022, 9:02:08 UTC Abgebrochen 48,736.27 89.00 --- Theory Simulation v300.07 (vbox64_theory) windows_x86_64 |
Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,866,264 RAC: 0 |
I've found a way to get these ones running again. Close Boinc. Open VBox. In Manager, find the offending VM. Remove it, with the "Delete all files" option. Start Boinc. The Task will start from the beginning again and, if the network connection has been restored, will proceed as normal. It retains the uselessly wasted idling elapsed time and doesn't solve the single probe attempt issue but is gentler than an Abort. |
Send message Joined: 2 May 07 Posts: 2184 Credit: 172,752,337 RAC: 39,114 |
Thank you Ray to find this way out. This problem is only in Windows not in native Theory. So, we have to watch it always to find this failed CVMFS connecting Theory tasks. |
Send message Joined: 2 May 07 Posts: 2184 Credit: 172,752,337 RAC: 39,114 |
2023-03-15 07:11:39 (15924): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2023-03-15 07:11:39 (15924): Guest Log: 2.5.2.0 4228 4 26084 268 3 1 257182 4096000 0 65024 0 0 n/a 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/cernvm-prod.cern.ch http://xxx.yyy.zzz:3128 1 2023-03-15 07:11:40 (15924): Guest Log: Probing /cvmfs/alice.cern.ch... Failed! 2023-03-15 07:11:40 (15924): Guest Log: 07:11:39 CET +01:00 2023-03-15: cranky: [ERROR] 'cvmfs_config probe alice.cern.ch' failed. 2023-03-15 07:12:06 (15924): Guest Log: [ERROR] Job Failed Lots of this fails from lots of Server (sft,,,, alice...), about 200 yesterday. |
Send message Joined: 2 May 07 Posts: 2184 Credit: 172,752,337 RAC: 39,114 |
2023-03-15 07:08:53 (14268): Guest Log: 07:08:52 CET +01:00 2023-03-15: cranky: [INFO] Detected Theory App 2023-03-15 07:08:53 (14268): Guest Log: 07:08:52 CET +01:00 2023-03-15: cranky: [INFO] Checking CVMFS. 2023-03-15 07:08:55 (14268): Guest Log: Probing /cvmfs/sft.cern.ch... Failed! 2023-03-15 07:08:55 (14268): Guest Log: 07:08:54 CET +01:00 2023-03-15: cranky: [ERROR] 'cvmfs_config probe sft.cern.ch' failed. 2023-03-15 08:39:55 (14268): Status Report: Job Duration: '864000.000000' |
Send message Joined: 2 May 07 Posts: 2184 Credit: 172,752,337 RAC: 39,114 |
Twenty tasks at once: Endstatus 203 (0x000000CB) EXIT_ABORTED_VIA_GUI Computer ID 10797673 Laufzeit 58 min. 18 sek. CPU Zeit 1 min. 23 sek. Prüfungsstatus Ungültig |
Send message Joined: 2 May 07 Posts: 2184 Credit: 172,752,337 RAC: 39,114 |
Today the same with fifty Theory-tasks! |
Send message Joined: 3 Aug 17 Posts: 7 Credit: 139,044 RAC: 14 |
Getting the following on all native theory tasks 10:35:06 BST +01:00 2023-06-08: cranky-0.0.32: [INFO] Checking CVMFS. 10:35:06 BST +01:00 2023-06-08: cranky-0.0.32: [ERROR] 'cvmfs_config probe sft.cern.ch' failed. Running sudo cvmfs_config probe comes out all OK. I have gone to the troubleshooting guide to get this far after checking and found the manual probing was failing but sorting that out hasn't helped. Ubuntu 23.04 BOINC 7.23.0 compiled from source. Probing /cvmfs/atlas.cern.ch... OK Probing /cvmfs/atlas-condb.cern.ch... OK Probing /cvmfs/grid.cern.ch... OK Probing /cvmfs/cernvm-prod.cern.ch... OK Probing /cvmfs/sft.cern.ch... OK Probing /cvmfs/alice.cern.ch... OK dave@Swarm:~$ sudo cvmfs_config probe Probing /cvmfs/atlas.cern.ch... OK Probing /cvmfs/atlas-condb.cern.ch... OK Probing /cvmfs/grid.cern.ch... OK Probing /cvmfs/cernvm-prod.cern.ch... OK Probing /cvmfs/sft.cern.ch... OK Probing /cvmfs/alice.cern.ch... OK |
Send message Joined: 2 May 07 Posts: 2184 Credit: 172,752,337 RAC: 39,114 |
12:46:39 BST +01:00 2023-06-08: cranky-0.0.32: [INFO] Checking CVMFS. /usr/bin/cvmfs_config: line 941: cd: /cvmfs/cvmfs-config.cern.ch: Transport endpoint is not connected Have no idea. |
Send message Joined: 3 Aug 17 Posts: 7 Credit: 139,044 RAC: 14 |
Now stopped theory tasks. Waiting to see if other native tasks work. |
Send message Joined: 27 Sep 08 Posts: 817 Credit: 683,156,491 RAC: 142,895 |
I just abort them, you can reboot the VM if you want to try to restart the process. |
Send message Joined: 3 Aug 17 Posts: 7 Credit: 139,044 RAC: 14 |
Still trying to get native theory tasks to run. [b]container_linux.go:336: starting container process caused "process_linux.go:293: applying cgroup configuration for process caused \"mountpoint for cgroup not found\""[/b] 15:57:35 BST +01:00 2023-06-09: cranky-0.0.32: [INFO] Container 'runc' finished with status code 1. 15:57:35 BST +01:00 2023-06-09: cranky-0.0.32: [INFO] Preparing output. 15:57:35 BST +01:00 2023-06-09: cranky-0.0.32: [ERROR] No output found. 15:57:36 (20463): cranky exited; CPU time 0.382304 15:57:36 (20463): app exit status: 0xce 15:57:36 (20463): called boinc_finish(195) Having rebooted since last attempt it seems to get a little further but still crashes all native theory tasks. |
©2024 CERN