21) Message boards : ATLAS application : ATLAS native version 2.73 (Message 40843)
Posted 8 Dec 2019 by Luigi R.
Post:
On Xubuntu 14.04.6 I have tried to run 4 single-thread tasks and it doesn't worked.

cvmfs_config probe returned 6 OKs.
First task started smoothly: https://lhcathome.cern.ch/lhcathome/result.php?resultid=254625003

Then other tasks got this error
check cvmfs return values are 0, 256
CVMFS not found, aborting the job
and failed: https://lhcathome.cern.ch/lhcathome/result.php?resultid=254625437
And cvmfs_config probe was returning 1 or 2 OKs.


This system has a problem with libseccomp too. I will try to fix it by suggested solutions.
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4972&postid=40840#40840


On Xubuntu 18.04.3 I have simultaneously started 4 tasks now and they are all running fine. There are 4 athena.py processes indeed.
cvmfs_config probe returns 6 OKs as expected.
22) Message boards : Theory Application : Issues Native Theory application (Message 40840)
Posted 7 Dec 2019 by Luigi R.
Post:
/cvmfs/grid.cern.ch/vc/containers/runc: symbol lookup error: /cvmfs/grid.cern.ch/vc/containers/runc: undefined symbol: seccomp_version

run "ldd /cvmfs/grid.cern.ch/vc/containers/runc" to check if all libs are installed (here: libseccomp.so.2).
If any is missing, install it from your distro's repository.
Is there no way to fix this?
https://lhcathome.cern.ch/lhcathome/result.php?resultid=254625750

lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 14.04.6 LTS
Release:	14.04
Codename:	trusty

uname -r
4.4.0-142-generic

ldd /cvmfs/grid.cern.ch/vc/containers/runc
linux-vdso.so.1 =>  (0x00007fffd3974000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f726f6ea000)
libseccomp.so.2 => /usr/lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007f726f4ce000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f726f2ca000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f726ef01000)
/lib64/ld-linux-x86-64.so.2 (0x00007f727026a000)
23) Message boards : Theory Application : Sherpa - longest runtime with Success - native (Message 40812)
Posted 6 Dec 2019 by Luigi R.
Post:
It would be nice to see runRivet.log.
24) Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners (Message 40810)
Posted 6 Dec 2019 by Luigi R.
Post:
This one was a success:
pp jets 7000 300 - sherpa 1.4.2 default 41000 190]

https://lhcathome.cern.ch/lhcathome/result.php?resultid=253954301
80,534.88s
25) Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners (Message 40783)
Posted 4 Dec 2019 by Luigi R.
Post:
Should we automatically abort them? Easy to do it by bash script.
26) Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners (Message 40781)
Posted 4 Dec 2019 by Luigi R.
Post:
Since it's making no progress after 75,000 events done, no much hope :(
Aborted. :(
https://lhcathome.cern.ch/lhcathome/result.php?resultid=253793046
[runRivet] Tue Dec  3 00:48:52 UTC 2019 [boinc pp jets 7000 40,-,760 - sherpa 1.4.5 default 100000 190]
27) Message boards : Theory Application : New version 300.00 (Message 40779)
Posted 4 Dec 2019 by Luigi R.
Post:
They failed by themselves some while ago...

https://lhcathome.cern.ch/lhcathome/result.php?resultid=253550444
https://lhcathome.cern.ch/lhcathome/result.php?resultid=253551061
https://lhcathome.cern.ch/lhcathome/result.php?resultid=253596506
28) Message boards : Theory Application : New version 300.00 (Message 40775)
Posted 3 Dec 2019 by Luigi R.
Post:
I have 3 sherpa jobs (not native).

1) CPU time 74:10:23, Elapsed time 72:59:24
2) CPU time 72:45:38, Elapsed time 71:43:31
3) CPU time 05:12:12, Elapsed time 42:06:48

Should I abort all of them?
29) Message boards : Theory Application : (Native) Theory - Sherpa looooooong runners (Message 40773)
Posted 3 Dec 2019 by Luigi R.
Post:
Is this task ok?

https://pastebin.com/FzvqwDfX
30) Message boards : Theory Application : New Native Theory Version 1.1 (Message 40764)
Posted 3 Dec 2019 by Luigi R.
Post:
Thank you very much computezrmle. I appreciated both answers and their order.

I have been suspecting there was a logic behind but I don't know about CPUs so much.
31) Message boards : Theory Application : New Native Theory Version 1.1 (Message 40761)
Posted 3 Dec 2019 by Luigi R.
Post:
I'm installing cvmfs on my machines. I don't know where I can do this question and I didn't find answers, so here it is.

I have got an i5, so 4 threads. I set 4 native tasks at time. Why are there so many processes running concurrently using 30-50% instead of only 4 processes using 100%? Is this efficient?
32) Message boards : Theory Application : Native app show odd resources in status (Message 40750)
Posted 2 Dec 2019 by Luigi R.
Post:
Not only native app.
https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=491&postid=6869#6869
33) Message boards : ATLAS application : ATLAS vbox version 2.00 (Message 40668)
Posted 25 Nov 2019 by Luigi R.
Post:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=252684654
there was something wrong with this task or with the VM processing.
And that's why the VM Console didn't work.
The excpert from the stderr says it cleraly:

Hypervisor System Log:
68:52:25.239712 nspr-4 ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85cd948e-a71f-4289-281e-0ca7ad48cd89} aComponent={SessionMachine} aText={No storage device attached to device slot 1 on port 0 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0
I really don't understand whiskey-tango-foxtrot it have crunched for 51 hours and it would have liked to continue. :(
34) Message boards : ATLAS application : ATLAS vbox version 2.00 (Message 40667)
Posted 25 Nov 2019 by Luigi R.
Post:
Yeah, thank you. Otherwise I can write a bash script that parses stderr.txt and automatically aborts the concerning task when three "Probing /cvmfs/*... Failed!" are raised (and those three lines must be consecutive).
Ok, it should work.

[...]
Obviously, this script does not work if you restart boinc client because it adds up all cvmfs fails regardless of whether some fails are from a previous start.
E.g. if you started boinc client 2 times and there were 2 fails the first time and 1 fail the second time, fail counter will be equal to 3 and script suspends that task. Wrong!

Now it checks if 3 fails are from consecutive lines or I guess it should do that.

Code: https://pastebin.com/r82vuzGM

Output example:
2 consecutive probing fails found in /home/luis/Applicazioni/boinc/slots/1/stderr.txt after line No. 77
2 consecutive probing fails found in /home/luis/Applicazioni/boinc/slots/1/stderr.txt after line No. 151
Total: 4 fails.
35) Message boards : ATLAS application : ATLAS vbox version 2.00 (Message 40662)
Posted 25 Nov 2019 by Luigi R.
Post:
I clicked "Show VM Console". There were some sentences ending with something like "disabled.", I can't remember.
Then Alt+F2 didn't show anything apart from an underscore.
I tried to restart BOINC. Something crashed while resuming that VM.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=252684654
36) Message boards : ATLAS application : ATLAS vbox version 2.00 (Message 40645)
Posted 25 Nov 2019 by Luigi R.
Post:
Is this task ok?

2019-11-23 16:31:15 (29320): vboxwrapper (7.7.26197): starting
2019-11-23 16:31:16 (29320): Feature: Checkpoint interval offset (402 seconds)
2019-11-23 16:31:16 (29320): Detected: VirtualBox VboxManage Interface (Version: 5.2.10)
2019-11-23 16:31:16 (29320): Detected: Minimum checkpoint interval (900.000000 seconds)
2019-11-23 16:31:16 (29320): Successfully copied 'init_data.xml' to the shared directory.
2019-11-23 16:31:16 (29320): Create VM. (boinc_13c65ac4e77b5ca5, slot#3)
2019-11-23 16:31:19 (29320): Setting Memory Size for VM. (6600MB)
2019-11-23 16:31:19 (29320): Setting CPU Count for VM. (1)
2019-11-23 16:31:19 (29320): Setting Chipset Options for VM.
2019-11-23 16:31:19 (29320): Setting Boot Options for VM.
2019-11-23 16:31:19 (29320): Setting Network Configuration for NAT.
2019-11-23 16:31:19 (29320): Enabling VM Network Access.
2019-11-23 16:31:19 (29320): Disabling USB Support for VM.
2019-11-23 16:31:19 (29320): Disabling COM Port Support for VM.
2019-11-23 16:31:19 (29320): Disabling LPT Port Support for VM.
2019-11-23 16:31:20 (29320): Disabling Audio Support for VM.
2019-11-23 16:31:20 (29320): Disabling Clipboard Support for VM.
2019-11-23 16:31:20 (29320): Disabling Drag and Drop Support for VM.
2019-11-23 16:31:20 (29320): Adding storage controller(s) to VM.
2019-11-23 16:31:20 (29320): Adding virtual disk drive to VM. (vm_image.vdi)
2019-11-23 16:31:20 (29320): Adding VirtualBox Guest Additions to VM.
2019-11-23 16:31:20 (29320): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2019-11-23 16:31:20 (29320): forwarding host port 59209 to guest port 80
2019-11-23 16:31:20 (29320): Enabling remote desktop for VM.
2019-11-23 16:31:20 (29320): Enabling shared directory for VM.
2019-11-23 16:31:21 (29320): Starting VM. (boinc_13c65ac4e77b5ca5, slot#3)
2019-11-23 16:31:22 (29320): Successfully started VM. (PID = '29969')
2019-11-23 16:31:22 (29320): Reporting VM Process ID to BOINC.
2019-11-23 16:31:22 (29320): Guest Log: BIOS: VirtualBox 5.2.10
2019-11-23 16:31:22 (29320): Guest Log: CPUID EDX: 0x078bfbff
2019-11-23 16:31:22 (29320): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63
2019-11-23 16:31:22 (29320): VM state change detected. (old = 'poweroff', new = 'running')
2019-11-23 16:31:22 (29320): Detected: Web Application Enabled (http://localhost:59209)
2019-11-23 16:31:22 (29320): Detected: Remote Desktop Enabled (localhost:41312)
2019-11-23 16:31:23 (29320): Preference change detected
2019-11-23 16:31:23 (29320): Setting CPU throttle for VM. (100%)
2019-11-23 16:31:23 (29320): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 300 seconds) or (Vbox_job.xml: 900 seconds))
2019-11-23 16:31:25 (29320): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2019-11-23 16:31:25 (29320): Guest Log: BIOS: Booting from Hard Disk...
2019-11-23 16:31:27 (29320): Guest Log: BIOS: KBD: unsupported int 16h function 03
2019-11-23 16:31:27 (29320): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=81
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=81
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=82
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=82
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=83
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=83
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=84
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=84
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=85
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=85
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=86
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=86
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=87
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=87
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=88
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=88
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=89
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=89
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8a
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8a
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8b
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8b
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8c
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8c
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8d
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8d
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8e
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8e
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8f
2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8f
2019-11-23 16:31:31 (29320): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
2019-11-23 16:31:31 (29320): Guest Log: vboxguest: misc device minor 58, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2019-11-23 16:31:34 (29320): Guest Log: Checking CVMFS...
2019-11-23 16:31:34 (29320): Guest Log: Failed to check CVMFS, check output from cvmfs_config probe:
2019-11-23 18:11:14 (29320): Status Report: Elapsed Time: '6000.585989'
2019-11-23 18:11:14 (29320): Status Report: CPU Time: '5567.880000'
2019-11-23 19:51:11 (29320): Status Report: Elapsed Time: '12000.743581'
2019-11-23 19:51:11 (29320): Status Report: CPU Time: '11286.870000'
2019-11-23 21:31:10 (29320): Status Report: Elapsed Time: '18002.214636'
2019-11-23 21:31:10 (29320): Status Report: CPU Time: '17047.590000'
2019-11-23 23:11:09 (29320): Status Report: Elapsed Time: '24004.466774'
2019-11-23 23:11:09 (29320): Status Report: CPU Time: '22828.480000'
2019-11-24 00:51:03 (29320): Status Report: Elapsed Time: '30004.874428'
2019-11-24 00:51:03 (29320): Status Report: CPU Time: '28598.400000'
2019-11-24 02:31:02 (29320): Status Report: Elapsed Time: '36006.562629'
2019-11-24 02:31:02 (29320): Status Report: CPU Time: '34382.740000'
2019-11-24 04:10:51 (29320): Status Report: Elapsed Time: '42007.551548'
2019-11-24 04:10:51 (29320): Status Report: CPU Time: '40147.510000'
2019-11-24 05:50:49 (29320): Status Report: Elapsed Time: '48007.580669'
2019-11-24 05:50:49 (29320): Status Report: CPU Time: '45889.600000'
2019-11-24 07:30:44 (29320): Status Report: Elapsed Time: '54007.754148'
2019-11-24 07:30:44 (29320): Status Report: CPU Time: '51617.320000'
2019-11-24 09:10:42 (29320): Status Report: Elapsed Time: '60007.954579'
2019-11-24 09:10:42 (29320): Status Report: CPU Time: '57349.920000'
2019-11-24 10:50:37 (29320): Status Report: Elapsed Time: '66008.825746'
2019-11-24 10:50:37 (29320): Status Report: CPU Time: '63149.470000'
2019-11-24 12:30:29 (29320): Status Report: Elapsed Time: '72009.494593'
2019-11-24 12:30:29 (29320): Status Report: CPU Time: '68899.570000'
2019-11-24 14:10:27 (29320): Status Report: Elapsed Time: '78009.708986'
2019-11-24 14:10:27 (29320): Status Report: CPU Time: '74605.640000'
2019-11-24 15:50:26 (29320): Status Report: Elapsed Time: '84011.110692'
2019-11-24 15:50:26 (29320): Status Report: CPU Time: '80253.640000'
2019-11-24 17:30:20 (29320): Status Report: Elapsed Time: '90012.575876'
2019-11-24 17:30:20 (29320): Status Report: CPU Time: '86041.990000'
2019-11-24 19:10:14 (29320): Status Report: Elapsed Time: '96013.218405'
2019-11-24 19:10:14 (29320): Status Report: CPU Time: '91444.780000'
2019-11-24 20:50:04 (29320): Status Report: Elapsed Time: '102013.498886'
2019-11-24 20:50:04 (29320): Status Report: CPU Time: '96998.080000'
2019-11-24 22:30:00 (29320): Status Report: Elapsed Time: '108013.533275'
2019-11-24 22:30:01 (29320): Status Report: CPU Time: '102776.200000'
2019-11-25 00:09:58 (29320): Status Report: Elapsed Time: '114013.561833'
2019-11-25 00:09:58 (29320): Status Report: CPU Time: '108540.050000'
2019-11-25 01:49:51 (29320): Status Report: Elapsed Time: '120014.473556'
2019-11-25 01:49:51 (29320): Status Report: CPU Time: '114327.220000'
2019-11-25 03:29:42 (29320): Status Report: Elapsed Time: '126014.698520'
2019-11-25 03:29:42 (29320): Status Report: CPU Time: '120073.000000'
2019-11-25 05:09:40 (29320): Status Report: Elapsed Time: '132014.853378'
2019-11-25 05:09:40 (29320): Status Report: CPU Time: '125800.820000'
2019-11-25 06:49:40 (29320): Status Report: Elapsed Time: '138016.302110'
2019-11-25 06:49:40 (29320): Status Report: CPU Time: '131517.270000'


Still crunching since 39 hours. CPU usage is 100%.
37) Message boards : Theory Application : New version 300.00 (Message 40496)
Posted 18 Nov 2019 by Luigi R.
Post:
Oh, I didn't notice that graph is clickable.

A lot of njobs=1 looks to me there are many uncommon jobs or large runtime from a few slow hosts so that 5mins-bins are too thin.
38) Message boards : Theory Application : New version 300.00 (Message 40488)
Posted 17 Nov 2019 by Luigi R.
Post:
The longest known Theory task of batch 2279 lasted 376 hours and 55 minutes, the second longest 236.5 hours ;)
Not so long then.
Anyway that host avg time is about 1.6 hours, so 18 hours is quite a bit. :)

What's the host avg time and theory app version for an almost 377-hours task?
39) Message boards : Theory Application : New version 300.00 (Message 40483)
Posted 17 Nov 2019 by Luigi R.
Post:
Jobs will last on average 2 hours rather than 12.
Very long task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=251991795
40) Message boards : Theory Application : New version 300.00 (Message 40471)
Posted 15 Nov 2019 by Luigi R.
Post:
All right, 4 clients got 8 tasks. Now I can go back to 1 client.


Previous 20 · Next 20


©2024 CERN