Message boards :
ATLAS application :
Computing error
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 May 15 Posts: 17 Credit: 11,627,311 RAC: 0 |
I'm having this error in one of my hosts in many WUs as of late, but not all WUs, maybe since beginning of this week (after outage?), before everything seemed to go more or less OK. I'm crunching 6 WU, 8 cores each, the host has 96 GB RAM, Is lack of enough RAM?, it was running 8 WUs before OK. In other hosts I get also this error but in very few WUs Any idea?. https://lhcathome.cern.ch/lhcathome/results.php?hostid=10564024 https://lhcathome.cern.ch/lhcathome/results.php?hostid=10558422 <core_client_version>7.6.31</core_client_version> <![CDATA[ <stderr_txt> 2018-09-20 23:45:46 (54265): vboxwrapper (7.7.26196): starting 2018-09-20 23:45:47 (54265): Feature: Checkpoint interval offset (466 seconds) 2018-09-20 23:45:47 (54265): Detected: VirtualBox VboxManage Interface (Version: 5.2.18) 2018-09-20 23:45:47 (54265): Detected: Minimum checkpoint interval (900.000000 seconds) 2018-09-20 23:45:47 (54265): Successfully copied 'init_data.xml' to the shared directory. 2018-09-20 23:45:47 (54265): Create VM. (boinc_b70d56aa415278d2, slot#6) 2018-09-20 23:45:53 (54265): Setting Memory Size for VM. (10200MB) 2018-09-20 23:45:54 (54265): Setting CPU Count for VM. (8) 2018-09-20 23:45:54 (54265): Setting Chipset Options for VM. 2018-09-20 23:45:54 (54265): Setting Boot Options for VM. 2018-09-20 23:45:54 (54265): Setting Network Configuration for NAT. 2018-09-20 23:45:54 (54265): Enabling VM Network Access. 2018-09-20 23:45:54 (54265): Disabling USB Support for VM. 2018-09-20 23:45:54 (54265): Disabling COM Port Support for VM. 2018-09-20 23:45:55 (54265): Disabling LPT Port Support for VM. 2018-09-20 23:45:55 (54265): Disabling Audio Support for VM. 2018-09-20 23:45:55 (54265): Disabling Clipboard Support for VM. 2018-09-20 23:45:55 (54265): Disabling Drag and Drop Support for VM. 2018-09-20 23:45:55 (54265): Adding storage controller(s) to VM. 2018-09-20 23:45:55 (54265): Adding virtual disk drive to VM. (vm_image.vdi) 2018-09-20 23:45:55 (54265): Adding VirtualBox Guest Additions to VM. 2018-09-20 23:45:55 (54265): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 2018-09-20 23:45:56 (54265): forwarding host port 57873 to guest port 80 2018-09-20 23:45:56 (54265): Enabling remote desktop for VM. 2018-09-20 23:45:56 (54265): Required extension pack not installed, remote desktop not enabled. 2018-09-20 23:45:56 (54265): Enabling shared directory for VM. 2018-09-20 23:45:56 (54265): Starting VM. (boinc_b70d56aa415278d2, slot#6) 2018-09-20 23:45:59 (54265): Successfully started VM. (PID = '56560') 2018-09-20 23:45:59 (54265): Reporting VM Process ID to BOINC. 2018-09-20 23:46:07 (54265): Guest Log: BIOS: VirtualBox 5.2.18 2018-09-20 23:46:07 (54265): Guest Log: CPUID EDX: 0x178bfbff 2018-09-20 23:46:07 (54265): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63 2018-09-20 23:46:07 (54265): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2018-09-20 23:46:07 (54265): Guest Log: BIOS: Booting from Hard Disk... 2018-09-20 23:46:07 (54265): Guest Log: BIOS: KBD: unsupported int 16h function 03 2018-09-20 23:46:07 (54265): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2018-09-20 23:46:07 (54265): VM state change detected. (old = 'poweroff', new = 'running') 2018-09-20 23:46:07 (54265): Detected: Web Application Enabled (http://localhost:57873) 2018-09-20 23:46:07 (54265): Preference change detected 2018-09-20 23:46:07 (54265): Setting CPU throttle for VM. (100%) 2018-09-20 23:46:12 (54265): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 600 seconds) or (Vbox_job.xml: 900 seconds)) 2018-09-20 23:47:22 (54265): Guest Log: vboxguest: major 0, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2018-09-20 23:47:34 (54265): Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff88028dcbac10), OR(0x0), NOT(0xffffffff), flags(0x0) 2018-09-20 23:47:34 (54265): Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff880287253610), OR(0x0), NOT(0xffffffff), flags(0x0) 2018-09-20 23:47:34 (54265): Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff88028dcba810), OR(0x0), NOT(0xffffffff), flags(0x0) 2018-09-20 23:47:34 (54265): Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff880287276810), OR(0x0), NOT(0xffffffff), flags(0x0) 2018-09-20 23:48:15 (54265): Guest Log: Copying input files into RunAtlas. 2018-09-20 23:48:19 (54265): Guest Log: Copied input files into RunAtlas. 2018-09-20 23:48:28 (54265): Guest Log: copied the webapp to /var/www 2018-09-20 23:48:28 (54265): Guest Log: This vm does not need to setup http proxy 2018-09-20 23:48:28 (54265): Guest Log: ATHENA_PROC_NUMBER=8 2018-09-20 23:48:29 (54265): Guest Log: Starting ATLAS job. (PandaID=4064500070 taskID=15385155) 2018-09-20 23:59:05 (54265): Preference change detected 2018-09-20 23:59:05 (54265): Setting CPU throttle for VM. (100%) 2018-09-20 23:59:05 (54265): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 600 seconds) or (Vbox_job.xml: 900 seconds)) 2018-09-21 00:05:13 (54265): Preference change detected 2018-09-21 00:05:13 (54265): Setting CPU throttle for VM. (100%) 2018-09-21 00:05:13 (54265): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 600 seconds) or (Vbox_job.xml: 900 seconds)) 2018-09-21 00:27:04 (54265): Preference change detected 2018-09-21 00:27:04 (54265): Setting CPU throttle for VM. (100%) 2018-09-21 00:27:04 (54265): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 600 seconds) or (Vbox_job.xml: 900 seconds)) 2018-09-21 01:25:18 (54265): Status Report: Elapsed Time: '6000.363421' 2018-09-21 01:25:18 (54265): Status Report: CPU Time: '41699.990000' 2018-09-21 02:36:24 (54265): VM is no longer is a running state. It is in 'poweroff'. 2018-09-21 02:36:24 (54265): VM state change detected. (old = 'running', new = 'poweroff') 2018-09-21 02:36:24 (54265): Powering off VM. 2018-09-21 02:36:24 (54265): Deregistering VM. (boinc_b70d56aa415278d2, slot#6) 2018-09-21 02:36:24 (54265): Removing network bandwidth throttle group from VM. 2018-09-21 02:36:24 (54265): Removing storage controller(s) from VM. 2018-09-21 02:36:24 (54265): Removing VM from VirtualBox. 2018-09-21 02:36:25 (54265): Removing virtual disk drive from VirtualBox. 2018-09-21 02:36:30 (54265): Virtual machine exited. 02:36:30 (54265): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>O47KDms37MtnlyackoJh5iwnABFKDmABFKDmTdiMDmABFKDmXHHw9m_2_r254113286_ATLAS_result</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> ]]> |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
You are running Theory tasks on one host and ATLAS tasks on the other. According to David Cameron in https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4178&postid=29560#29560 the RAM formula for ATLAS VBox tasks is: 3 GB + 0.9 GB * ncores. For 6 X 8 core tasks: 6 * ( 3 + 0.9 * 8) = 61.2 GB For 8 X 8 core tasks: 8 * ( 3 + 0.9 * 8) = 81.6 GB According to Crystal Pellet in https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4790&postid=36791#36791 the RAM formula for Theory tasks is: 630 MB + 100 MB * ncores For 6 X 8 core tasks: 6 * ( 630 + 100 * 8) = 8580 MB = 8.4 GB So you have enough RAM. Most likely it's the recent outage. |
Send message Joined: 14 May 15 Posts: 17 Credit: 11,627,311 RAC: 0 |
Yes, that calculations is what I made but errors occurred, randomly apparently but constantly. Analyzing in VirtualBox the data of the VMs I find two different sizes for the VMs base memory: 5000Mb and 10200Mb. The later are the ones failing. I reduced to 5 WUs per host and no error so far. I do not see any difference in the WU name construction that allow identifying them and I can not say it is not something in my end. |
©2024 CERN