1) Message boards : CMS Application : CMS@Home difficulties in attempts to prepare for multi-core jobs (Message 49831)
Posted 24 Mar 2024 by m
Post:
This host has 4 cores.
Preferences set to run 1 job and 4 cores.
Three other jobs (non LHC) are "Waiting to run".
No app_config.
top (f3) shows cmsRun @ 200%.
CMS job running about twice as fast as usual.
2) Message boards : ATLAS application : hits file upload fails immediately (Message 49738)
Posted 8 Mar 2024 by m
Post:
I ran out of editing time...

I have changed the "client_request_buffer_max_size" setting in squid_conf to 1500 MB (was previously set at 10240 KB)
see here which applies to much later squid versions than that in use here and may be relevant.

A log extract is:-

Fri 08 Mar 2024 13:58:16 GMT | LHC@home | [http] [ID#17] Sent header to server: Accept-Language: en_GB
Fri 08 Mar 2024 13:58:16 GMT | LHC@home | [http] [ID#17] Sent header to server: Content-Length: 1483171612
Fri 08 Mar 2024 13:58:16 GMT | LHC@home | [http] [ID#17] Sent header to server: Expect: 100-continue
Fri 08 Mar 2024 13:58:16 GMT | LHC@home | [http] [ID#17] Sent header to server:
Fri 08 Mar 2024 13:58:16 GMT | LHC@home | [http] [ID#17] Received header from server: HTTP/1.1 100 Continue
Fri 08 Mar 2024 13:58:16 GMT | LHC@home | [http] [ID#17] Received header from server: Connection: keep-alive
Fri 08 Mar 2024 13:58:28 GMT | LHC@home | [http] [ID#17] Info: Recv failure: Connection reset by peer
Fri 08 Mar 2024 13:58:28 GMT | LHC@home | [http] [ID#17] Info: Closing connection 7
Fri 08 Mar 2024 13:58:28 GMT | LHC@home | [http] HTTP error: Failure when receiving data from the peer
Fri 08 Mar 2024 13:58:29 GMT | | Project communication failed: attempting access to reference site
Fri 08 Mar 2024 13:58:29 GMT | | [http] HTTP_OP::init_get(): http://www.google.com/
Fri 08 Mar 2024 13:58:29 GMT | LHC@home | [file_xfer] http op done; retval -184 (transient HTTP error)
Fri 08 Mar 2024 13:58:29 GMT | LHC@home | [file_xfer] file transfer status -184 (transient HTTP error)
Fri 08 Mar 2024 13:58:29 GMT | LHC@home | Temporarily failed upload of ID3NDmgTuz4np2BDcpmwOghnABFKDmABFKDm73LSDmv4hKDmP85t6n_1_r717556734_ATLAS_hits: transient HTTP error
Fri 08 Mar 2024 13:58:29 GMT | LHC@home | Backing off 04:35:21 on upload of ID3NDmgTuz4np2BDcpmwOghnABFKDmABFKDm73LSDmv4hKDmP85t6n_1_r717556734_ATLAS_hits

Which looks completely different... although the upload still fails.
3) Message boards : ATLAS application : hits file upload fails immediately (Message 49737)
Posted 8 Mar 2024 by m
Post:
Failing here, too. Using a local proxy. This is an extract from the BOINC log. From the "Payload too large" error
does it look as if a definite limit is being exceeded rather than something is running out of
memory?

[http] [ID#12] Sent header to server: Content-Type: application/x-www-form-urlencoded
[http] [ID#12] Sent header to server: Accept-Language: en_GB
[http] [ID#12] Sent header to server: Content-Length: 1486489603
[http] [ID#12] Sent header to server: Expect: 100-continue
[http] [ID#12] Sent header to server:
[http] [ID#12] Received header from server: HTTP/1.1 413 Payload Too Large
[http] [ID#12] Received header from server: Date: Fri, 08 Mar 2024 00:21:32 GMT
[http] [ID#12] Received header from server: Server: Apache
[http] [ID#12] Received header from server: Content-Type: text/html; charset=iso-8859-1
[http] [ID#12] Received header from server: X-Cache: MISS from Teec00
[http] [ID#12] Received header from server: X-Cache-Lookup: MISS from Teec00:3128
[http] [ID#12] Received header from server: Transfer-Encoding: chunked
[http] [ID#12] Received header from server: Via: 1.1 Teec00 (squid/3.5.12)
[http] [ID#12] Received header from server: Connection: keep-alive
[http] [ID#12] Info: HTTP error before end of send, stop sending
[http] [ID#12] Received header from server:
[http_xfer] [ID#12] HTTP: wrote 316 bytes
[http] [ID#12] Info: Closing connection 3
[file_xfer] http op done; retval -224 (permanent HTTP error)
[file_xfer] file transfer status -224 (permanent HTTP error)
Backing off 05:58:26 on upload of ID3NDmgTuz4np2BDcpmwOghnABFKDmABFKDm73LSDmv4hKDmP85t6n_1_r717556734_ATLAS_hits
4) Message boards : CMS Application : no new WUs available (Message 48414)
Posted 10 Aug 2023 by m
Post:
Best wishes.
Health is more important than LHC@home.
Take care and take the time it needs to recover.


I second that.
With very best wishes.
jp
5) Message boards : CMS Application : EXIT_NO_SUB_TASKS (Message 43750)
Posted 29 Nov 2020 by m
Post:
Very best wishes for complete and speedy recovery, Ivan. Meanwhile - take things easy for a bit.
jp
6) Message boards : ATLAS application : Switch to Cloudflare Frontier server (Message 43656)
Posted 20 Nov 2020 by m
Post:
OK here, too.
7) Message boards : Theory Application : Issues Native Theory application (Message 40841)
Posted 7 Dec 2019 by m
Post:
From this post you may find that you need a more recent version of libseccomp than the one that comes with Ubuntu 14.04. If you can, move to 16.04. That one works.
8) Message boards : Theory Application : New version 300.00 (Message 40477)
Posted 16 Nov 2019 by m
Post:
I have the same problem. First reported with the dev tasks here.
It occurs from both Windows and Linux hosts - all of them and every task fails. The Linux hosts have CVMFS installed. If run from the command line (the local) CVMFS works, so the system here isn't blocking access, but it seems not from the VM. Fails both using the local proxy and direct. VBox is v5.2.x. Haven't (yet) tried different VBox versions.
9) Message boards : Cafe LHC : McIntosh (Eric) is back (Message 39697)
Posted 23 Aug 2019 by m
Post:
I am back in business.

That's very good news. Welcome back.
jp.
10) Message boards : Theory Application : New Native Theory Version 1.1 (Message 39491)
Posted 3 Aug 2019 by m
Post:
Not sure if I'm making progress or not...

found and installed libseccomp 2.4.1 (the regular way from a lxc PPA)

that produced this

18:17:01 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Updating config.json.
18:17:02 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Running Container 'runc'.
container_linux.go:336: starting container process caused "process_linux.go:293:
applying cgroup configuration for process caused \"mountpoint for devices not found\""
18:17:02 BST +01:00 2019-08-03: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1.
18:17:02 (3846): cranky exited; CPU time 0.005645
18:17:02 (3846): app exit status: 0xce
18:17:02 (3846): called boinc_finish(195)
18:17:03 BST +01:00 2019-08-03: cranky-0.0.29: [INFO]

found this
https://stackoverflow.com/questions/22555264/docker-hello-world-not-working/22555932#22555932

and this https://github.com/opencontainers/runc/issues/798

so inatalled cgroups-lite 1.11

with this result

18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Checking runc.
18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Creating the filesystem.
18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3
18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Updating config.json.
18:34:16 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Running Container 'runc'.
18:34:48 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] ===> [runRivet] Sat Aug 3
17:34:48 UTC 2019 [boinc pp mb-inelastic 7000 - - phojet 1.12a default 100000 84]
18:34:53 BST +01:00 2019-08-03: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1.
18:34:54 (4451): cranky exited; CPU time 0.044983
18:34:54 (4451): app exit status: 0xce
18:34:54 (4451): called boinc_finish(195)

all subsequent tasks end like this

18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Checking runc.
18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Creating the filesystem.
18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Using /cvmfs/cernvm-prod.cern.ch/cvm3
18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Updating config.json.
18:45:42 BST +01:00 2019-08-03: cranky-0.0.29: [INFO] Running Container 'runc'.
standard_init_linux.go:203: exec user process caused "too many levels of symbolic links"
18:45:44 BST +01:00 2019-08-03: cranky-0.0.29: [ERROR] Container 'runc' terminated with status code 1.
18:45:44 (4733): cranky exited; CPU time 0.014341
18:45:44 (4733): app exit status: 0xce
18:45:44 (4733): called boinc_finish(195)
18:45:44 BST +01:00 2019-08-03: cranky-0.0.29: [INFO]

I haven't explicitly added any symlinks.

Now I'm stuck.
11) Message boards : Theory Application : New Native Theory Version 1.1 (Message 39490)
Posted 3 Aug 2019 by m
Post:
IIRC runc requires at least libseccomp.so.2.3.1 which might not be present on older linux systems.

OK that's helpful, thanks, and for the instructions, too. In the meantime I had found and installed v2.2.3-2 (from trusty backports) which didn't help.

You may try to copy libseccomp.so.2.3.1 or a more recent libseccomp from a newer linux

I've got the .deb archive for 2.4.1 (which appears to be available as a security update for 16.04) but can't get a usable library file out of it. It's 41bytes and shows up as a "broken link" I've clearly done something wrong somewhere.
It looks like moving to OS version 16.04 might be needed but that means much downloading.... don't want that at the moment., so the hunt goes on.
12) Message boards : Theory Application : New Native Theory Version 1.1 (Message 39485)
Posted 2 Aug 2019 by m
Post:
I would like to run LinuxN tasks on Ubuntu 14.04.6 (kernel 3.13)

No errors setting things up, but
tasks fail like this:-

19:13:41 (3159): wrapper (7.15.26016): starting
19:13:42 (3159): wrapper (7.15.26016): starting
19:13:42 (3159): wrapper: running ../../projects/lhcathome.cern.ch_lhcathome/cranky-0.0.29 ()
19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Detected TheoryN App
19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Checking CVMFS.
19:13:42 BST +01:00 2019-08-02: cranky-0.0.29: [INFO] Checking runc.
/cvmfs/grid.cern.ch/vc/containers/runc: symbol lookup error: /cvmfs/grid.cern.ch/vc/containers/runc: undefined symbol: seccomp_version
19:13:43 BST +01:00 2019-08-02: cranky-0.0.29: [ERROR] 'runc -v' failed.
19:13:44 (3159): cranky exited; CPU time 0.008713
19:13:44 (3159): app exit status: 0xce
19:13:44 (3159): called boinc_finish(195)

The installed libseccomp version is:-

libseccomp2 2.1.1-1ubuntu1-trusty5

I have tried temporarily installing the "seccomp" package with no effect.

Temporarily installing the "lxc" and "lxcfs" packages also had no effect.

Any ideas welcome
13) Message boards : News : new exes for SixTrack 5.02.05 (Message 39042)
Posted 4 Jun 2019 by m
Post:

It should be noted that all Win exes are distributed without targeting specific kernel versions - hence, XP hosts may receive tasks with regular Windows exes immediately failing, but the BOINC server should quickly learn that the XP-compatible exe is the appropriate one.

It doesn't seem to be learning very quickly. At the time of writing one host has received 17 "regular" tasks, which failed and 7 "XP" ones which are OK. The details are here
If it doesn't improve I'll try an app_info file to make sure it uses the correct exe.
14) Message boards : Sixtrack Application : SixTrack v502.05 on Win XP (Message 38735)
Posted 7 May 2019 by m
Post:

Microsoft has stopped support to XP 5 years ago.

Not quite, at least for 32 bit systems; this is (part of) the last XP update..

2019-04 Update for POSReady 2009 for x86-based Systems (KB4487990), Windows XP ... Update for WES09 and POSReady 2009 (KB4468323), Windows XP ...as far as I'm aware, this really is the end of free MS support.

and there's an (unofficial) SP4. However there is clearly a limit on the effort that can be justified for this.

[edit] Updating from XP often involved scrapping a lot of still serviceable hardware which must have quite an environmental cost, If this can be reduced without too much trouble , it must be good [/edit]
15) Message boards : Sixtrack Application : SixTrack v502.05 on Win XP (Message 38710)
Posted 6 May 2019 by m
Post:
OK. Thanks for taking the trouble. The breath is baited, the fingers quivering...
16) Message boards : Sixtrack Application : SixTrack v502.05 on Win XP (Message 38693)
Posted 4 May 2019 by m
Post:
Overnight one of the XP/POSReady SP3 hosts here got 20 of these. They fail like this. Looks as though it would be fairly simple to fix. These hosts have been kept partly to help out with SixTrack for which they have done sterling service, so here is a request for this...Please.
17) Message boards : Theory Application : How extend Theory VBox tasks? (Message 38653)
Posted 27 Apr 2019 by m
Post:
Where do you get the "events done 758000 attemps 49 success 30 failure 1 lost 18" data?

You can get info like this...

From one of the MC pages, such as the status page or the one that shows the results for your user id, you may already have a shortcut to this,
http://mcplots-dev.cern.ch/production.php?view=user&system=3&userid=xxxxxx

click "Control".
Then, on the highlighted line, click the entry in the "coverage" column.
The "Runs summary" lets you pick the results category you want.. I don't know why results are "Masked". maybe it's not as simple as it looks but the rest seem self explanatory. Try "Unsuccessful". I think that all these are run on BOINC, maybe not all by us volunteers. I'm sure someone from the project will explain further.
18) Message boards : News : Server upgrade (Message 37602)
Posted 13 Dec 2018 by m
Post:
No "stats export" checkbox on mine, just dashes. So presumably no export either.
If I hover the mouse over the dashes, the "pop up" message appears OK.

The dev project preferences page is is OK.
19) Message boards : LHCb Application : New version v1.05 (Message 36998)
Posted 10 Oct 2018 by m
Post:
Many thanks for the comments.

... CPU usage of lhc VBox WUs should be planned using a calculation factor of 1.3.
Thus a 2 core CPU like the ones you use to run LHCb would be able to run a 1-core setup.

They will, it's just that they used (still do sometimes) to run 2 core jobs without obvious problems.
To test if your hosts are able to deal with a 2-core setup you may monitor your host's "load average" using top or htop. The values should not be much higher than the number of CPU cores. Otherwise it indicates a too busy system.

I'm sure you're right.
There are four hosts here that normally run LHCb (along with others) - stayed up late to last night to check...
1 Running LHCb (2 core) and TNGrid jobs, load average 1.2 - 1.5 The LHCb was running OK, although, as others have commented, the CPU time looks a bit low.
2. Running LHCb(single core) and TNGrid. Load average 1.2 - 1.3
3 Running two TNGrid, load average 2.01 - 2.1
4. Running TNGrid and Theory (single core). Load average 2.01
If your system is too busy you may consider to run your VBox tasks as a 1-core setup.

I don't know it these loads are too high, but they're running single core now so I'll see what happens.
I expect that the loads depend on the exact nature of the job being processed at the time and the hosts are just too close to their limit. It will be a few days before I know.
20) Message boards : LHCb Application : New version v1.05 (Message 36984)
Posted 9 Oct 2018 by m
Post:
For the last few days, most of my LHCb tasks have failed no heartbeat file, some with missing heartbeat.
Various hosts, Theory tasks are OK. Has the "faster startup" change been lost somehow, or is something else awry? Something has changed.


Next 20


©2024 CERN