Message boards : Theory Application : Probing /cvfms/grid.cern.ch... Failed!
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,542,927
RAC: 15,505
Message 45758 - Posted: 26 Nov 2021, 19:06:56 UTC

I have started to get errors on cvfms like shown on title. The monitor shows next 'ERROR Could not source logging functions from /cvfms/grid.cern.ch/vc/bin/logging_functions.' CPU time stays at 0 but runtime counts on. These have to be manually aborted.

Here's one:https://lhcathome.cern.ch/lhcathome/result.php?resultid=334519241
ID: 45758 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 675
Credit: 43,542,927
RAC: 15,505
Message 45800 - Posted: 6 Dec 2021, 22:16:44 UTC

Three errors today but now they are all for:
2021-12-06 22:09:42 (21084): Guest Log: Probing /cvmfs/cernvm-prod.cern.ch... Failed!
2021-12-06 22:09:42 (21084): Guest Log: 22:09:41 EET +02:00 2021-12-06: cranky: [ERROR] 'cvmfs_config probe cernvm-prod.cern.ch' failed. 

ID: 45800 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,119,521
RAC: 126,893
Message 45863 - Posted: 14 Dec 2021, 14:03:43 UTC - in response to Message 45800.  

I have had numerous such cases in the last 10 days or so.

2021-12-13 23:10:07 (15164): Guest Log: Probing /cvmfs/grid.cern.ch... Failed!
2021-12-13 23:10:07 (15164): Guest Log: 23:10:09 CET +01:00 2021-12-13: cranky: [ERROR] 'cvmfs_config probe grid.cern.ch' failed.


The bad thing is that once this happens, the WU does not stop automatically, thus wasting a slot until I happen to discover the problem and stop the WU manually.
With having run several Theory tasks concurrently on a total of 8 computers, it is somewhat troublesome to find out quickly once this problem happens.
ID: 45863 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,119,521
RAC: 126,893
Message 47055 - Posted: 31 Jul 2022, 15:57:52 UTC

in the past few days, I have had several of such failures.

As I wrote before, the bad thing with those is that they don't stop automatically, but run (without CPU utilization) until the problem is being detected by chance.

Why is it not possible to build in some kind of mechanism which makes the task stop and abort automatically in the case of such a failure?
ID: 47055 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1274
Credit: 8,480,242
RAC: 2,028
Message 47334 - Posted: 30 Sep 2022, 16:21:04 UTC

2022-09-30 18:02:23 (3380): Guest Log: 18:02:24 CEST +02:00 2022-09-30: cranky: [INFO] Checking CVMFS.

2022-09-30 18:02:25 (3380): Guest Log: Probing /cvmfs/sft.cern.ch... OK

2022-09-30 18:02:26 (3380): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE

2022-09-30 18:02:26 (3380): Guest Log: 2.5.2.0 4103 1 28244 23913 2 1 257119 4096000 0 65024 0 0 n/a 5 15 http://s1cern-cvmfs.openhtc.io/cvmfs/sft.cern.ch DIRECT 1

2022-09-30 18:02:28 (3380): Guest Log: Probing /cvmfs/grid.cern.ch... OK

2022-09-30 18:02:29 (3380): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE

2022-09-30 18:02:29 (3380): Guest Log: 2.5.2.0 4078 1 27548 19399 2 2 257119 4096000 0 65024 0 2 -100 7783 2376 http://s1bnl-cvmfs.openhtc.io/cvmfs/grid.cern.ch DIRECT 1

2022-09-30 18:02:31 (3380): Guest Log: Probing /cvmfs/cernvm-prod.cern.ch... OK

2022-09-30 18:02:32 (3380): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE

2022-09-30 18:02:32 (3380): Guest Log: 2.5.2.0 4051 1 27232 268 2 1 257119 4096000 0 65024 0 0 n/a 4 10 http://s1ral-cvmfs.openhtc.io/cvmfs/cernvm-prod.cern.ch DIRECT 1

2022-09-30 18:02:33 (3380): Guest Log: Probing /cvmfs/alice.cern.ch... Failed!

2022-09-30 18:02:33 (3380): Guest Log: 18:02:34 CEST +02:00 2022-09-30: cranky: [ERROR] 'cvmfs_config probe alice.cern.ch' failed.

2022-09-30 18:02:34 (3380): Guest Log: [ERROR] Job Failed

2022-09-30 18:02:34 (3380): Guest Log: [INFO] Shutting Down.


Is Probing /cvmfs/alice.cern.ch... useful?
ID: 47334 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2401
Credit: 225,518,043
RAC: 123,638
Message 47335 - Posted: 30 Sep 2022, 16:42:19 UTC - in response to Message 47334.  

Is Probing /cvmfs/alice.cern.ch... useful?

So far alice is sometimes required for Theory tasks.

Just tested to connect to it's main catalog via different cloudflare servers (wget, no proxy).
All tests succeeded.
"cvmfs_config probe alice" also succeeded.
ID: 47335 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1274
Credit: 8,480,242
RAC: 2,028
Message 47336 - Posted: 1 Oct 2022, 9:24:18 UTC - in response to Message 47335.  

All tests succeeded.
"cvmfs_config probe alice" also succeeded.
Must have been a network glitch.
ID: 47336 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 0
Message 47582 - Posted: 7 Dec 2022, 20:12:14 UTC
Last modified: 7 Dec 2022, 20:16:30 UTC

I've had a few of these recently (latest one was from -dev) There is only one attempt at probing and if that fails then the tasks idles until I intervene and Abort it, otherwise it will run doing nothing until the timeout.
Might it be helpful to add further attempts to probe, after a delay(s), so that any transient network problem may have resolved itself. Then if the problem persists, after "n" attempts, to end the Task rather than waiting for the timeout or manual intervention?
It doesn't happen often but it is annoying coming home from work to find one that has been idle for a day or more.
ID: 47582 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,879,357
RAC: 125,879
Message 47613 - Posted: 24 Dec 2022, 9:51:33 UTC - in response to Message 47582.  

374605387 200459726 23 Dec 2022, 18:44:16 UTC 24 Dec 2022, 9:02:08 UTC Abgebrochen 48,207.93 108.98 --- Theory Simulation v300.07 (vbox64_theory)
windows_x86_64
374605404 200465087 23 Dec 2022, 18:44:09 UTC 24 Dec 2022, 9:02:08 UTC Abgebrochen 48,833.27 85.61 --- Theory Simulation v300.07 (vbox64_theory)
windows_x86_64
374605406 200458911 23 Dec 2022, 18:44:09 UTC 24 Dec 2022, 9:02:08 UTC Abgebrochen 48,736.27 89.00 --- Theory Simulation v300.07 (vbox64_theory)
windows_x86_64
ID: 47613 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 0
Message 47736 - Posted: 27 Jan 2023, 23:12:13 UTC - in response to Message 47582.  

I've found a way to get these ones running again.
Close Boinc.
Open VBox.
In Manager, find the offending VM.
Remove it, with the "Delete all files" option.
Start Boinc.
The Task will start from the beginning again and, if the network connection has been restored, will proceed as normal. It retains the uselessly wasted idling elapsed time and doesn't solve the single probe attempt issue but is gentler than an Abort.
ID: 47736 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,879,357
RAC: 125,879
Message 47737 - Posted: 28 Jan 2023, 2:50:21 UTC - in response to Message 47736.  

Thank you Ray to find this way out.
This problem is only in Windows not in native Theory.
So, we have to watch it always to find this failed CVMFS connecting Theory tasks.
ID: 47737 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,879,357
RAC: 125,879
Message 47858 - Posted: 15 Mar 2023, 6:43:12 UTC

2023-03-15 07:11:39 (15924): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2023-03-15 07:11:39 (15924): Guest Log: 2.5.2.0 4228 4 26084 268 3 1 257182 4096000 0 65024 0 0 n/a 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/cernvm-prod.cern.ch http://xxx.yyy.zzz:3128 1
2023-03-15 07:11:40 (15924): Guest Log: Probing /cvmfs/alice.cern.ch... Failed!
2023-03-15 07:11:40 (15924): Guest Log: 07:11:39 CET +01:00 2023-03-15: cranky: [ERROR] 'cvmfs_config probe alice.cern.ch' failed.
2023-03-15 07:12:06 (15924): Guest Log: [ERROR] Job Failed

Lots of this fails from lots of Server (sft,,,, alice...),
about 200 yesterday.
ID: 47858 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,879,357
RAC: 125,879
Message 47862 - Posted: 15 Mar 2023, 11:53:49 UTC - in response to Message 47858.  

2023-03-15 07:08:53 (14268): Guest Log: 07:08:52 CET +01:00 2023-03-15: cranky: [INFO] Detected Theory App
2023-03-15 07:08:53 (14268): Guest Log: 07:08:52 CET +01:00 2023-03-15: cranky: [INFO] Checking CVMFS.
2023-03-15 07:08:55 (14268): Guest Log: Probing /cvmfs/sft.cern.ch... Failed!
2023-03-15 07:08:55 (14268): Guest Log: 07:08:54 CET +01:00 2023-03-15: cranky: [ERROR] 'cvmfs_config probe sft.cern.ch' failed.
2023-03-15 08:39:55 (14268): Status Report: Job Duration: '864000.000000'
ID: 47862 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,879,357
RAC: 125,879
Message 47865 - Posted: 17 Mar 2023, 7:39:37 UTC - in response to Message 47862.  

Twenty tasks at once:
Endstatus 203 (0x000000CB) EXIT_ABORTED_VIA_GUI
Computer ID 10797673
Laufzeit 58 min. 18 sek.
CPU Zeit 1 min. 23 sek.
Prüfungsstatus Ungültig
ID: 47865 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,879,357
RAC: 125,879
Message 47907 - Posted: 26 Mar 2023, 17:49:07 UTC - in response to Message 47865.  

Today the same with fifty Theory-tasks!
ID: 47907 · Report as offensive     Reply Quote
Dave

Send message
Joined: 3 Aug 17
Posts: 7
Credit: 138,754
RAC: 0
Message 48205 - Posted: 8 Jun 2023, 11:43:50 UTC
Last modified: 8 Jun 2023, 11:47:02 UTC

Getting the following on all native theory tasks
10:35:06 BST +01:00 2023-06-08: cranky-0.0.32: [INFO] Checking CVMFS.
10:35:06 BST +01:00 2023-06-08: cranky-0.0.32: [ERROR] 'cvmfs_config probe sft.cern.ch' failed.

Running sudo cvmfs_config probe comes out all OK. I have gone to the troubleshooting guide to get this far after checking and found the manual probing was failing but sorting that out hasn't helped.
Ubuntu 23.04 BOINC 7.23.0 compiled from source.

Probing /cvmfs/atlas.cern.ch... OK
Probing /cvmfs/atlas-condb.cern.ch... OK
Probing /cvmfs/grid.cern.ch... OK
Probing /cvmfs/cernvm-prod.cern.ch... OK
Probing /cvmfs/sft.cern.ch... OK
Probing /cvmfs/alice.cern.ch... O
K
dave@Swarm:~$ sudo cvmfs_config probe
Probing /cvmfs/atlas.cern.ch... OK
Probing /cvmfs/atlas-condb.cern.ch... OK
Probing /cvmfs/grid.cern.ch... OK
Probing /cvmfs/cernvm-prod.cern.ch... OK
Probing /cvmfs/sft.cern.ch... OK
Probing /cvmfs/alice.cern.ch... OK
ID: 48205 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,879,357
RAC: 125,879
Message 48206 - Posted: 8 Jun 2023, 12:26:31 UTC - in response to Message 48205.  

12:46:39 BST +01:00 2023-06-08: cranky-0.0.32: [INFO] Checking CVMFS.
/usr/bin/cvmfs_config: line 941: cd: /cvmfs/cvmfs-config.cern.ch: Transport endpoint is not connected
Have no idea.
ID: 48206 · Report as offensive     Reply Quote
Dave

Send message
Joined: 3 Aug 17
Posts: 7
Credit: 138,754
RAC: 0
Message 48207 - Posted: 8 Jun 2023, 14:32:56 UTC

Now stopped theory tasks. Waiting to see if other native tasks work.
ID: 48207 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 804
Credit: 650,265,819
RAC: 251,826
Message 48208 - Posted: 8 Jun 2023, 19:25:25 UTC

I just abort them, you can reboot the VM if you want to try to restart the process.
ID: 48208 · Report as offensive     Reply Quote
Dave

Send message
Joined: 3 Aug 17
Posts: 7
Credit: 138,754
RAC: 0
Message 48209 - Posted: 9 Jun 2023, 15:10:57 UTC

Still trying to get native theory tasks to run.

[b]container_linux.go:336: starting container process caused "process_linux.go:293: applying cgroup configuration for process caused \"mountpoint for cgroup not found\""[/b]
15:57:35 BST +01:00 2023-06-09: cranky-0.0.32: [INFO] Container 'runc' finished with status code 1.
15:57:35 BST +01:00 2023-06-09: cranky-0.0.32: [INFO] Preparing output.
15:57:35 BST +01:00 2023-06-09: cranky-0.0.32: [ERROR] No output found.
15:57:36 (20463): cranky exited; CPU time 0.382304
15:57:36 (20463): app exit status: 0xce
15:57:36 (20463): called boinc_finish(195)


Having rebooted since last attempt it seems to get a little further but still crashes all native theory tasks.
ID: 48209 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Theory Application : Probing /cvfms/grid.cern.ch... Failed!


©2024 CERN