Message boards :
ATLAS application :
ATLAS vbox version 2.00
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 27 Sep 08 Posts: 846 Credit: 691,124,232 RAC: 110,952 |
I got interesting logs in my task today: kworker/0:3:95 blocked for more than 120 seconds. You can't switch to the other logings. |
Send message Joined: 7 Feb 14 Posts: 99 Credit: 5,180,005 RAC: 0 |
Typical config entriesThere isn't cernvm-prod.cern.ch in my stderr files. Client side issues could be:Maybe this. My router wifi is not as good as old router + there are Theory VMs, smartphones, paytv, etc... |
Send message Joined: 7 Feb 14 Posts: 99 Credit: 5,180,005 RAC: 0 |
Is this task ok? 2019-11-23 16:31:15 (29320): vboxwrapper (7.7.26197): starting 2019-11-23 16:31:16 (29320): Feature: Checkpoint interval offset (402 seconds) 2019-11-23 16:31:16 (29320): Detected: VirtualBox VboxManage Interface (Version: 5.2.10) 2019-11-23 16:31:16 (29320): Detected: Minimum checkpoint interval (900.000000 seconds) 2019-11-23 16:31:16 (29320): Successfully copied 'init_data.xml' to the shared directory. 2019-11-23 16:31:16 (29320): Create VM. (boinc_13c65ac4e77b5ca5, slot#3) 2019-11-23 16:31:19 (29320): Setting Memory Size for VM. (6600MB) 2019-11-23 16:31:19 (29320): Setting CPU Count for VM. (1) 2019-11-23 16:31:19 (29320): Setting Chipset Options for VM. 2019-11-23 16:31:19 (29320): Setting Boot Options for VM. 2019-11-23 16:31:19 (29320): Setting Network Configuration for NAT. 2019-11-23 16:31:19 (29320): Enabling VM Network Access. 2019-11-23 16:31:19 (29320): Disabling USB Support for VM. 2019-11-23 16:31:19 (29320): Disabling COM Port Support for VM. 2019-11-23 16:31:19 (29320): Disabling LPT Port Support for VM. 2019-11-23 16:31:20 (29320): Disabling Audio Support for VM. 2019-11-23 16:31:20 (29320): Disabling Clipboard Support for VM. 2019-11-23 16:31:20 (29320): Disabling Drag and Drop Support for VM. 2019-11-23 16:31:20 (29320): Adding storage controller(s) to VM. 2019-11-23 16:31:20 (29320): Adding virtual disk drive to VM. (vm_image.vdi) 2019-11-23 16:31:20 (29320): Adding VirtualBox Guest Additions to VM. 2019-11-23 16:31:20 (29320): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 2019-11-23 16:31:20 (29320): forwarding host port 59209 to guest port 80 2019-11-23 16:31:20 (29320): Enabling remote desktop for VM. 2019-11-23 16:31:20 (29320): Enabling shared directory for VM. 2019-11-23 16:31:21 (29320): Starting VM. (boinc_13c65ac4e77b5ca5, slot#3) 2019-11-23 16:31:22 (29320): Successfully started VM. (PID = '29969') 2019-11-23 16:31:22 (29320): Reporting VM Process ID to BOINC. 2019-11-23 16:31:22 (29320): Guest Log: BIOS: VirtualBox 5.2.10 2019-11-23 16:31:22 (29320): Guest Log: CPUID EDX: 0x078bfbff 2019-11-23 16:31:22 (29320): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63 2019-11-23 16:31:22 (29320): VM state change detected. (old = 'poweroff', new = 'running') 2019-11-23 16:31:22 (29320): Detected: Web Application Enabled (http://localhost:59209) 2019-11-23 16:31:22 (29320): Detected: Remote Desktop Enabled (localhost:41312) 2019-11-23 16:31:23 (29320): Preference change detected 2019-11-23 16:31:23 (29320): Setting CPU throttle for VM. (100%) 2019-11-23 16:31:23 (29320): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 300 seconds) or (Vbox_job.xml: 900 seconds)) 2019-11-23 16:31:25 (29320): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2019-11-23 16:31:25 (29320): Guest Log: BIOS: Booting from Hard Disk... 2019-11-23 16:31:27 (29320): Guest Log: BIOS: KBD: unsupported int 16h function 03 2019-11-23 16:31:27 (29320): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=81 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=81 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=82 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=82 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=83 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=83 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=84 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=84 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=85 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=85 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=86 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=86 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=87 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=87 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=88 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=88 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=89 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=89 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8a 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8a 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8b 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8b 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8c 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8c 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8d 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8d 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8e 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8e 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk_ext: function 41, unmapped device for ELDL=8f 2019-11-23 16:31:27 (29320): Guest Log: int13_harddisk: function 02, unmapped device for ELDL=8f 2019-11-23 16:31:31 (29320): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds 2019-11-23 16:31:31 (29320): Guest Log: vboxguest: misc device minor 58, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2019-11-23 16:31:34 (29320): Guest Log: Checking CVMFS... 2019-11-23 16:31:34 (29320): Guest Log: Failed to check CVMFS, check output from cvmfs_config probe: 2019-11-23 18:11:14 (29320): Status Report: Elapsed Time: '6000.585989' 2019-11-23 18:11:14 (29320): Status Report: CPU Time: '5567.880000' 2019-11-23 19:51:11 (29320): Status Report: Elapsed Time: '12000.743581' 2019-11-23 19:51:11 (29320): Status Report: CPU Time: '11286.870000' 2019-11-23 21:31:10 (29320): Status Report: Elapsed Time: '18002.214636' 2019-11-23 21:31:10 (29320): Status Report: CPU Time: '17047.590000' 2019-11-23 23:11:09 (29320): Status Report: Elapsed Time: '24004.466774' 2019-11-23 23:11:09 (29320): Status Report: CPU Time: '22828.480000' 2019-11-24 00:51:03 (29320): Status Report: Elapsed Time: '30004.874428' 2019-11-24 00:51:03 (29320): Status Report: CPU Time: '28598.400000' 2019-11-24 02:31:02 (29320): Status Report: Elapsed Time: '36006.562629' 2019-11-24 02:31:02 (29320): Status Report: CPU Time: '34382.740000' 2019-11-24 04:10:51 (29320): Status Report: Elapsed Time: '42007.551548' 2019-11-24 04:10:51 (29320): Status Report: CPU Time: '40147.510000' 2019-11-24 05:50:49 (29320): Status Report: Elapsed Time: '48007.580669' 2019-11-24 05:50:49 (29320): Status Report: CPU Time: '45889.600000' 2019-11-24 07:30:44 (29320): Status Report: Elapsed Time: '54007.754148' 2019-11-24 07:30:44 (29320): Status Report: CPU Time: '51617.320000' 2019-11-24 09:10:42 (29320): Status Report: Elapsed Time: '60007.954579' 2019-11-24 09:10:42 (29320): Status Report: CPU Time: '57349.920000' 2019-11-24 10:50:37 (29320): Status Report: Elapsed Time: '66008.825746' 2019-11-24 10:50:37 (29320): Status Report: CPU Time: '63149.470000' 2019-11-24 12:30:29 (29320): Status Report: Elapsed Time: '72009.494593' 2019-11-24 12:30:29 (29320): Status Report: CPU Time: '68899.570000' 2019-11-24 14:10:27 (29320): Status Report: Elapsed Time: '78009.708986' 2019-11-24 14:10:27 (29320): Status Report: CPU Time: '74605.640000' 2019-11-24 15:50:26 (29320): Status Report: Elapsed Time: '84011.110692' 2019-11-24 15:50:26 (29320): Status Report: CPU Time: '80253.640000' 2019-11-24 17:30:20 (29320): Status Report: Elapsed Time: '90012.575876' 2019-11-24 17:30:20 (29320): Status Report: CPU Time: '86041.990000' 2019-11-24 19:10:14 (29320): Status Report: Elapsed Time: '96013.218405' 2019-11-24 19:10:14 (29320): Status Report: CPU Time: '91444.780000' 2019-11-24 20:50:04 (29320): Status Report: Elapsed Time: '102013.498886' 2019-11-24 20:50:04 (29320): Status Report: CPU Time: '96998.080000' 2019-11-24 22:30:00 (29320): Status Report: Elapsed Time: '108013.533275' 2019-11-24 22:30:01 (29320): Status Report: CPU Time: '102776.200000' 2019-11-25 00:09:58 (29320): Status Report: Elapsed Time: '114013.561833' 2019-11-25 00:09:58 (29320): Status Report: CPU Time: '108540.050000' 2019-11-25 01:49:51 (29320): Status Report: Elapsed Time: '120014.473556' 2019-11-25 01:49:51 (29320): Status Report: CPU Time: '114327.220000' 2019-11-25 03:29:42 (29320): Status Report: Elapsed Time: '126014.698520' 2019-11-25 03:29:42 (29320): Status Report: CPU Time: '120073.000000' 2019-11-25 05:09:40 (29320): Status Report: Elapsed Time: '132014.853378' 2019-11-25 05:09:40 (29320): Status Report: CPU Time: '125800.820000' 2019-11-25 06:49:40 (29320): Status Report: Elapsed Time: '138016.302110' 2019-11-25 06:49:40 (29320): Status Report: CPU Time: '131517.270000' Still crunching since 39 hours. CPU usage is 100%. |
Send message Joined: 15 Jun 08 Posts: 2530 Credit: 253,722,201 RAC: 51,175 |
The logfile looks fine. Since you have VirtualBox Guest Additions installed you can click on "Show VM Console". Then use ALT-F2 to switch to ATLAS Event Progress Monitoring. |
Send message Joined: 14 Jan 10 Posts: 1417 Credit: 9,441,018 RAC: 1,047 |
Is this task ok?I don't think so. Normally when CVMFS is OK there should come this: Guest Log: Mounting shared directory Guest Log: Copying input files Guest Log: Copied input files into RunAtlas. |
Send message Joined: 7 Feb 14 Posts: 99 Credit: 5,180,005 RAC: 0 |
I clicked "Show VM Console". There were some sentences ending with something like "disabled.", I can't remember. Then Alt+F2 didn't show anything apart from an underscore. I tried to restart BOINC. Something crashed while resuming that VM. https://lhcathome.cern.ch/lhcathome/result.php?resultid=252684654 |
Send message Joined: 18 Dec 15 Posts: 1811 Credit: 118,315,936 RAC: 27,492 |
https://lhcathome.cern.ch/lhcathome/result.php?resultid=252684654there was something wrong with this task or with the VM processing. And that's why the VM Console didn't work. The excpert from the stderr says it cleraly: Hypervisor System Log: 68:52:25.239712 nspr-4 ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={85cd948e-a71f-4289-281e-0ca7ad48cd89} aComponent={SessionMachine} aText={No storage device attached to device slot 1 on port 0 of controller 'Hard Disk Controller'}, preserve=false aResultDetail=0 |
Send message Joined: 7 Feb 14 Posts: 99 Credit: 5,180,005 RAC: 0 |
Obviously, this script does not work if you restart boinc client because it adds up all cvmfs fails regardless of whether some fails are from a previous start.Yeah, thank you. Otherwise I can write a bash script that parses stderr.txt and automatically aborts the concerning task when three "Probing /cvmfs/*... Failed!" are raisedOk, it should work. E.g. if you started boinc client 2 times and there were 2 fails the first time and 1 fail the second time, fail counter will be equal to 3 and script suspends that task. Wrong! Now it checks if 3 fails are from consecutive lines or I guess it should do that. Code: https://pastebin.com/r82vuzGM Output example: 2 consecutive probing fails found in /home/luis/Applicazioni/boinc/slots/1/stderr.txt after line No. 77 2 consecutive probing fails found in /home/luis/Applicazioni/boinc/slots/1/stderr.txt after line No. 151Total: 4 fails. |
Send message Joined: 7 Feb 14 Posts: 99 Credit: 5,180,005 RAC: 0 |
I really don't understand whiskey-tango-foxtrot it have crunched for 51 hours and it would have liked to continue. :(https://lhcathome.cern.ch/lhcathome/result.php?resultid=252684654there was something wrong with this task or with the VM processing. |
Send message Joined: 25 May 14 Posts: 6 Credit: 3,633,724 RAC: 0 |
Hello, I have a problem with ATLAS tasks running on WM Version 6.0.14 r133895 (Qt5.6.2) and my PC running Windows10 64 bits. Every second ATLAS task is stopping after 'Checking CVMFS...' : Normal task: 00:00:10.649235 VMMDev: Guest Log: vboxguest: misc device minor 58, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 00:00:10.835497 Display::i_handleDisplayResize: uScreenId=0 pvVRAM=000000000d470000 w=800 h=600 bpp=32 cbLine=0xC80 flags=0x1 origin=0,0 00:00:11.821696 NAT: IPv6 not supported 00:00:12.101758 NAT: DHCP offered IP address 10.0.2.15 00:00:12.102189 NAT: DHCP offered IP address 10.0.2.15 00:00:12.577408 VMMDev: Guest Log: Checking CVMFS... 00:00:16.535978 VMMDev: Guest Log: VBoxService 5.2.32 r132073 (verbosity: 0) linux.amd64 (Jul 12 2019 10:32:28) release log 00:00:16.536006 VMMDev: Guest Log: 00:00:00.000125 main Log opened 2019-12-04T20:06:44.657029000Z 00:00:16.536053 VMMDev: Guest Log: 00:00:00.000216 main OS Product: Linux 00:00:16.536081 VMMDev: Guest Log: 00:00:00.000246 main OS Release: 3.10.0-957.27.2.el7.x86_64 00:00:16.536103 VMMDev: Guest Log: 00:00:00.000269 main OS Version: #1 SMP Mon Jul 29 17:46:05 UTC 2019 00:00:16.536126 VMMDev: Guest Log: 00:00:00.000291 main Executable: /opt/VBoxGuestAdditions-5.2.32/sbin/VBoxService 00:00:16.536133 VMMDev: Guest Log: 00:00:00.000291 main Process ID: 1657 00:00:16.536138 VMMDev: Guest Log: 00:00:00.000292 main Package type: LINUX_64BITS_GENERIC 00:00:16.536769 VMMDev: Guest Log: 00:00:00.000933 main 5.2.32 r132073 started. Verbose level = 0 00:00:16.537354 Guest Control: GUEST_MSG_REPORT_FEATURES: 0x1, 0x8000000000000000 00:00:26.538370 VMMDev: Guest Log: 00:00:10.002511 timesync vgsvcTimeSyncWorker: Radical guest time change: -3 589 017 775 000ns (GuestNow=1 575 486 415 641 299 000 ns GuestLast=1 575 490 004 659 074 000 ns fSetTimeLastLoop=true ) 00:00:40.329289 VMMDev: Guest Log: CVMFS is ok 00:00:40.484761 VMMDev: Guest Log: Mounting shared directory 'Bad' tasks are stopping (they are runing, but they are not doing anything) before line: 00:00:40.329289 VMMDev: Guest Log: CVMFS is ok After I abort the 'bad' task, the next task run normally - I get 'VMMDev: Guest Log: CVMFS is ok' Then the next task stops again and I have to abort it and so on and so on ... Can anyone help me ? |
Send message Joined: 2 May 07 Posts: 2242 Credit: 173,899,709 RAC: 2,814 |
Hi Marjan, welcome, do you have WiFi for this connection or a problem with your ISP? Atlas needs a correct network-connection to work well. Help is for example in Yeti's Checklist in the Atlas-Folder. You can reduce the working for only one Atlas only, to see if this is working well. |
Send message Joined: 25 May 14 Posts: 6 Credit: 3,633,724 RAC: 0 |
Hi maeax, Thank you for your reply. I will check my networks settings later at home, because i'm at work now. I'm not sure when i will be able to make some tests, because i think i don't have a lot of ATLAS tasks available. But I found a temporary 'emergency' solution yesterday: When i get 'CVMFS is OK', i suspend current task and start another task. And when i get 'CVMFS is OK' from the new task, i suspend it again and so on. After 5 - 10 suspended tasks i resume all of them (at the same time) and all of them finish properly without any problem. Regards. |
Send message Joined: 25 May 14 Posts: 6 Credit: 3,633,724 RAC: 0 |
Hi maeax, i checked my settings with Yeti checklist. I found that boinc.exe didn't have incoming communications (it had outgoing comunications only). i modified settings in my firewall and now i'm waiting to see the effect. . . . Thanks again for your advice. Regards. |
Send message Joined: 25 May 14 Posts: 6 Credit: 3,633,724 RAC: 0 |
No improvement. I think if the previous ATLAS task is doing upload, the VM for the next ATLAS task doesn't start properly - it hangs ( there is no: 'VMMDev: Guest Log: CVMFS is ok') and the atlas job does not start at all. Regards. |
Send message Joined: 2 May 07 Posts: 2242 Credit: 173,899,709 RAC: 2,814 |
Hi Marjan, you have many tasks waiting for working. Boinc calculate the next download, when it is time therefore. Don't know how you are possible to have so many tasks waiting. You can check your prefs for each project (Atlas, CMS, Theory...) and reduce the number of tasks. Your Atlas-tasks finished well today. |
Send message Joined: 25 May 14 Posts: 6 Credit: 3,633,724 RAC: 0 |
Hi maeax, I put in computing preferences to store at least 5 days of work. How can i set prefs for each LHC project individually ? All ATLAS job finished well, because i set all of them to 'SUSPEND' and now i 'RESUME' them one by one when the last finished ATLAS job completed UPLOAD. I know it isn't ideal solution, but is the best i have according to my knowledge. ;-) It is also another solution, i describe it before, which allow me to be away from my computer for a long time. But preparation takes one or more hours . . . Regards. |
Send message Joined: 15 Jun 08 Posts: 2530 Credit: 253,722,201 RAC: 51,175 |
Can you post some details regarding your network? Is your host connected via wi-fi or cable (what speed)? What bandwidth do you get from your ISP? Upload? Download? Ping timing, e.g. to lhcathome.cern.ch? How many vbox tasks do you start/run concurrently? |
Send message Joined: 7 Jan 07 Posts: 41 Credit: 16,102,983 RAC: 47 |
computezrmle wrote: BOINC uses 2 main factors to calculate estimated runtimes (as well as credits): I have 2 hosts for which it's true but the 3rd has a variable GFLOPS : Host #1 has 23.08 GFLOPS for every tasks Host #2 has 55.01 GFLOPS for every tasks Host #3 has 35.77, 12.06, 9.04, 6.03 and 3.01 GFLOPS What could be the reason for the third host to have different figures ? |
Send message Joined: 14 Jan 10 Posts: 1417 Credit: 9,441,018 RAC: 1,047 |
Host #3 has 35.77, 12.06, 9.04, 6.03 and 3.01 GFLOPSProbably you have changed your preference (home,school,work) for that host a few times. The GFLOPS-values you gave are for 1-, 2-, 3-, 4- and 16 threads and they come from your preference # of CPUs. Maybe you correct that locally with an app_config.xml |
Send message Joined: 7 Jan 07 Posts: 41 Credit: 16,102,983 RAC: 47 |
Preferences are the same and I have not changed settings for a while. Otherwise, I noticed that Boinc client version is 7.6.33 for the host having multiple GFLOPS and that the two others hosts are using version 7.9.3. Boinc 7.6.33 is the release for Debian Stretch 9.11. Boinc 7.9.3 is the release for Ubuntu Bionic Beaver 18.04.3 LTS. I will give a try for the Stretch backport which is at version 7.10.2 and see if the behaviour is still the same. |
©2024 CERN