1) Message boards : Theory Application : Issues Native Theory application (Message 40841)
Posted 7 Dec 2019 by m
Post:
From this post you may find that you need a more recent version of libseccomp than the one that comes with Ubuntu 14.04. If you can, move to 16.04. That one works.
2) Message boards : Theory Application : New version 300.00 (Message 40477)
Posted 16 Nov 2019 by m
Post:
I have the same problem. First reported with the dev tasks here.
It occurs from both Windows and Linux hosts - all of them and every task fails. The Linux hosts have CVMFS installed. If run from the command line (the local) CVMFS works, so the system here isn't blocking access, but it seems not from the VM. Fails both using the local proxy and direct. VBox is v5.2.x. Haven't (yet) tried different VBox versions.
3) Message boards : Cafe LHC : McIntosh (Eric) is back (Message 39697)
Posted 23 Aug 2019 by m
Post:
I am back in business.

That's very good news. Welcome back.
jp.
4) Message boards : Theory Application : New Native Theory Version 1.1 (Message 39491)
Posted 3 Aug 2019 by m
Post:
Not sure if I'm making progress or not...

found and installed libseccomp 2.4.1 (the regular way from a lxc PPA)

that produced this

18:17:01 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Updating config.json.
18:17:02 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Running Container 'runc'.
container_linux.go:336: starting container process caused "process_linux.go:293:
applying cgroup configuration for process caused \"mountpoint for devices not found\""
18:17:02 BST +01:00 2019-08-03: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1.
18:17:02 (3846): cranky exited; CPU time 0.005645
18:17:02 (3846): app exit status: 0xce
18:17:02 (3846): called boinc_finish(195)
18:17:03 BST +01:00 2019-08-03: cranky-0.0.29: [INFO]

found this
https://stackoverflow.com/questions/22555264/docker-hello-world-not-working/22555932#22555932

and this https://github.com/opencontainers/runc/issues/798

so inatalled cgroups-lite 1.11

with this result

18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Checking runc.
18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Creating the filesystem.
18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3
18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Updating config.json.
18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Running Container 'runc'.
18:34:48 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] ===> [runRivet] Sat Aug 3
17:34:48 UTC 2019 [boinc pp mb-inelastic 7000 - - phojet 1.12a default 100000 84]
18:34:53 BST +01:00 2019-08-03: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1.
18:34:54 (4451): cranky exited; CPU time 0.044983
18:34:54 (4451): app exit status: 0xce
18:34:54 (4451): called boinc_finish(195)

all subsequent tasks end like this

18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Checking runc.
18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Creating the filesystem.
18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3
18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Updating config.json.
18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Running Container 'runc'.
standard_init_linux.go:203: exec user process caused "too many levels of symbolic links"
18:45:44 BST +01:00 2019-08-03: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1.
18:45:44 (4733): cranky exited; CPU time 0.014341
18:45:44 (4733): app exit status: 0xce
18:45:44 (4733): called boinc_finish(195)
18:45:44 BST +01:00 2019-08-03: cranky-0.0.29: [INFO]

I haven't explicitly added any symlinks.

Now I'm stuck.
5) Message boards : Theory Application : New Native Theory Version 1.1 (Message 39490)
Posted 3 Aug 2019 by m
Post:
IIRC runc requires at least libseccomp.so.2.3.1 which might not be present on older linux systems.

OK that's helpful, thanks, and for the instructions, too. In the meantime I had found and installed v2.2.3-2 (from trusty backports) which didn't help.

You may try to copy libseccomp.so.2.3.1 or a more recent libseccomp from a newer linux

I've got the .deb archive for 2.4.1 (which appears to be available as a security update for 16.04) but can't get a usable library file out of it. It's 41bytes and shows up as a "broken link" I've clearly done something wrong somewhere.
It looks like moving to OS version 16.04 might be needed but that means much downloading.... don't want that at the moment., so the hunt goes on.
6) Message boards : Theory Application : New Native Theory Version 1.1 (Message 39485)
Posted 2 Aug 2019 by m
Post:
I would like to run LinuxN tasks on Ubuntu 14.04.6 (kernel 3.13)

No errors setting things up, but
tasks fail like this:-

19:13:41 (3159): wrapper (7.15.26016): starting
19:13:42 (3159): wrapper (7.15.26016): starting
19:13:42 (3159): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.29 ()
19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Detected TheoryN App
19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Checking CVMFS.
19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Checking runc.
/cvmfs/grid.cern.ch/vc/containers/runc: symbol lookup error: /cvmfs/grid.cern.ch/vc/containers/runc: undefined symbol: seccomp_version
19:13:43 BST +01:00 2019-08-02: cranky-0.0.29: [ERROR] 'runc -v' failed.
19:13:44 (3159): cranky exited; CPU time 0.008713
19:13:44 (3159): app exit status: 0xce
19:13:44 (3159): called boinc_finish(195)

The installed libseccomp version is:-

libseccomp2 2.1.1-1ubuntu1-trusty5

I have tried temporarily installing the "seccomp" package with no effect.

Temporarily installing the "lxc" and "lxcfs" packages also had no effect.

Any ideas welcome
7) Message boards : News : new exes for SixTrack 5.02.05 (Message 39042)
Posted 4 Jun 2019 by m
Post:

It should be noted that all Win exes are distributed without targeting specific kernel versions - hence, XP hosts may receive tasks with regular Windows exes immediately failing, but the BOINC server should quickly learn that the XP-compatible exe is the appropriate one.

It doesn't seem to be learning very quickly. At the time of writing one host has received 17 "regular" tasks, which failed and 7 "XP" ones which are OK. The details are here
If it doesn't improve I'll try an app_info file to make sure it uses the correct exe.
8) Message boards : Sixtrack Application : SixTrack v502.05 on Win XP (Message 38735)
Posted 7 May 2019 by m
Post:

Microsoft has stopped support to XP 5 years ago.

Not quite, at least for 32 bit systems; this is (part of) the last XP update..

2019-04 Update for POSReady 2009 for x86-based Systems (KB4487990), Windows XP ... Update for WES09 and POSReady 2009 (KB4468323), Windows XP ...as far as I'm aware, this really is the end of free MS support.

and there's an (unofficial) SP4. However there is clearly a limit on the effort that can be justified for this.

[edit] Updating from XP often involved scrapping a lot of still serviceable hardware which must have quite an environmental cost, If this can be reduced without too much trouble , it must be good [/edit]
9) Message boards : Sixtrack Application : SixTrack v502.05 on Win XP (Message 38710)
Posted 6 May 2019 by m
Post:
OK. Thanks for taking the trouble. The breath is baited, the fingers quivering...
10) Message boards : Sixtrack Application : SixTrack v502.05 on Win XP (Message 38693)
Posted 4 May 2019 by m
Post:
Overnight one of the XP/POSReady SP3 hosts here got 20 of these. They fail like this. Looks as though it would be fairly simple to fix. These hosts have been kept partly to help out with SixTrack for which they have done sterling service, so here is a request for this...Please.
11) Message boards : Theory Application : How extend Theory VBox tasks? (Message 38653)
Posted 27 Apr 2019 by m
Post:
Where do you get the "events done 758000 attemps 49 success 30 failure 1 lost 18" data?

You can get info like this...

From one of the MC pages, such as the status page or the one that shows the results for your user id, you may already have a shortcut to this,
http://mcplots-dev.cern.ch/production.php?view=user&system=3&userid=xxxxxx

click "Control".
Then, on the highlighted line, click the entry in the "coverage" column.
The "Runs summary" lets you pick the results category you want.. I don't know why results are "Masked". maybe it's not as simple as it looks but the rest seem self explanatory. Try "Unsuccessful". I think that all these are run on BOINC, maybe not all by us volunteers. I'm sure someone from the project will explain further.
12) Message boards : News : Server upgrade (Message 37602)
Posted 13 Dec 2018 by m
Post:
No "stats export" checkbox on mine, just dashes. So presumably no export either.
If I hover the mouse over the dashes, the "pop up" message appears OK.

The dev project preferences page is is OK.
13) Message boards : LHCb Application : New version v1.05 (Message 36998)
Posted 10 Oct 2018 by m
Post:
Many thanks for the comments.

... CPU usage of lhc VBox WUs should be planned using a calculation factor of 1.3.
Thus a 2 core CPU like the ones you use to run LHCb would be able to run a 1-core setup.

They will, it's just that they used (still do sometimes) to run 2 core jobs without obvious problems.
To test if your hosts are able to deal with a 2-core setup you may monitor your host's "load average" using top or htop. The values should not be much higher than the number of CPU cores. Otherwise it indicates a too busy system.

I'm sure you're right.
There are four hosts here that normally run LHCb (along with others) - stayed up late to last night to check...
1 Running LHCb (2 core) and TNGrid jobs, load average 1.2 - 1.5 The LHCb was running OK, although, as others have commented, the CPU time looks a bit low.
2. Running LHCb(single core) and TNGrid. Load average 1.2 - 1.3
3 Running two TNGrid, load average 2.01 - 2.1
4. Running TNGrid and Theory (single core). Load average 2.01
If your system is too busy you may consider to run your VBox tasks as a 1-core setup.

I don't know it these loads are too high, but they're running single core now so I'll see what happens.
I expect that the loads depend on the exact nature of the job being processed at the time and the hosts are just too close to their limit. It will be a few days before I know.
14) Message boards : LHCb Application : New version v1.05 (Message 36984)
Posted 9 Oct 2018 by m
Post:
For the last few days, most of my LHCb tasks have failed no heartbeat file, some with missing heartbeat.
Various hosts, Theory tasks are OK. Has the "faster startup" change been lost somehow, or is something else awry? Something has changed.
15) Message boards : ATLAS application : Panda status (Message 36727)
Posted 16 Sep 2018 by m
Post:
Recent tasks here have appeared to complete successfully at this end but sttill show "running" at the Cern end.(Panda) after several hours

https://lhcathome.cern.ch/lhcathome/result.php?resultid=206716615

happened to -dev tasks, too:-

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2398589

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=2398588

Not sure about this, though:-
"This is trying to run the run_atlas wrapper for the 2nd time,..."
I'm pretty certain all these tasks ran without interruption.
16) Message boards : News : CMS production pause (Message 36479)
Posted 17 Aug 2018 by m
Post:
Another one,,https://lhcathome.cern.ch/lhcathome/result.php?resultid=205073544

In pinciple this doesn't seem to be a new problem; all the current VM projects suffer.- it's just worse.
Tasks seem to fail on restart if the wrapper doesn't "see" that a job been completed..
Previously this information appeared to be "saved" over the shutdown
so a failure only occurred if no task had been completed before the shutdown.(I've got lots of these...)
This "saving" no longer happens, or it's hidden inside the container, so tasks fail.

It's probably more complicated than this, but this is how it seems to behave here.
17) Message boards : Theory Application : New Version 263.70 (Message 35978)
Posted 20 Jul 2018 by m
Post:
the VM still sometimes fails to use the local squid.

Do you still have some VMs with errors?

Yes, in total I have details of three tasks:-
In all cases the proxy was reported as detected, but the VM was not reported as set up to use it.
Entries in the access log are taken to show that the proxy was (or wasn't) actually used.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=200124140 ( Proxy used).
https://lhcathome.cern.ch/lhcathome/result.php?resultid=200145344 ( Proxy not used)
https://lhcathome.cern.ch/lhcathome/result.php?resultid=200221912 ( Proxy used)

The last two are the same host.

The inconsistency is puzzling. I'll have to wait for some more. If anyone else sees these failures it would be interesting to see their results. Hopefully I haven't misread something somewhere...
18) Message boards : Theory Application : New Version 263.70 (Message 35935)
Posted 16 Jul 2018 by m
Post:
VMs are using the local squid again
Working OK so far.
Thanks Laurence.

Maybe I wrote too soon, the VM still sometimes fails to use the local squid.
2018-07-14 03:46:43 (2620): Guest Log: [DEBUG] Detected squid proxy http://192.168.100.137:3128

2018-07-14 03:47:59 (2620): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE

2018-07-14 03:47:59 (2620): Guest Log: 2.4.4.0 3533 1 25768 6661 3 1 183731 10240000 2 65024 0 15 93.3333 13 21 http://s1cern-cvmfs.openhtc.io/cvmfs/grid.cern.ch DIRECT 0
19) Message boards : Theory Application : New Version 263.70 (Message 35875)
Posted 12 Jul 2018 by m
Post:
VMs are using the local squid again

(3909): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2018-07-11 02:53:10 (3909): Guest Log: 2.4.4.0 3540 1 25728 6631 3 1 183741 10240000 2 65024 0 15 100 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/grid.cern.ch http://192.168.100.137:3128 1
2018-07-11 02:53:11 (3909): Guest Log: [INFO] Reading volunteer information

Working OK so far.
Thanks Laurence.
20) Message boards : Theory Application : New Version 263.70 (Message 35807)
Posted 7 Jul 2018 by m
Post:
The heartbeat interval is 20mins and it should beat every minute. So the VM is killed if it takes longer than 20mins to boot or has frozen for 20 minutes.


Are the times in the tasks below right? Looks like the timeout is still 10mins and the heartbeat interval is 20 mins, surely I'm misreading this?
The actual failure is probably OK - it tried to use 2 CPU when it shouldn't.

Theory Simulation v263.70 (vbox64_mt_mcore)
x86_64-pc-linux-gnu

2018-07-07 01:00:20 (7559): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
.....
2018-07-07 01:00:28 (7559): Successfully started VM. (PID = '8126')
.....
2018-07-07 01:10:23 (7559): VM Heartbeat file specified, but missing.
2018-07-07 01:10:23 (7559): VM Heartbeat file specified, but missing file system status. (errno = '2')

Another host...


2018-07-07 06:26:32 (2567): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
.....
2018-07-07 06:26:38 (2567): Successfully started VM. (PID = '3049')
.....
2018-07-07 06:36:33 (2567): VM Heartbeat file specified, but missing.
2018-07-07 06:36:33 (2567): VM Heartbeat file specified, but missing file system status. (errno = '2')


Next 20


©2020 CERN