Message boards :
Number crunching :
Some WU's tagged computation error but logs looks good
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 |
The log appears to show the task ended successfully but gets put into the error category. Can someone else spot what I'm missing about these? There are more than 3 but just linking 3 examples with full text of one: Task 164085532 Task 164109292 Task 164278255: <core_client_version>7.8.3</core_client_version> <![CDATA[ <message> finish file present too long</message> <stderr_txt> 2017-11-13 06:01:52 (14908): Detected: vboxwrapper 26197 2017-11-13 06:01:52 (14908): Detected: BOINC client v7.7 2017-11-13 06:02:08 (14908): Detected: VirtualBox VboxManage Interface (Version: 5.1.22) 2017-11-13 06:02:09 (14908): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds) 2017-11-13 06:02:09 (14908): Successfully copied 'init_data.xml' to the shared directory. 2017-11-13 06:02:22 (14908): Create VM. (boinc_bca0ac41590654b4, slot#24) 2017-11-13 06:02:29 (14908): Setting Memory Size for VM. (512MB) 2017-11-13 06:02:41 (14908): Setting CPU Count for VM. (1) 2017-11-13 06:02:58 (14908): Setting Chipset Options for VM. 2017-11-13 06:03:18 (14908): Setting Boot Options for VM. 2017-11-13 06:03:20 (14908): Setting Network Configuration for NAT. 2017-11-13 06:03:26 (14908): Enabling VM Network Access. 2017-11-13 06:03:26 (14908): Disabling USB Support for VM. 2017-11-13 06:03:26 (14908): Disabling COM Port Support for VM. 2017-11-13 06:03:27 (14908): Disabling LPT Port Support for VM. 2017-11-13 06:03:27 (14908): Disabling Audio Support for VM. 2017-11-13 06:03:27 (14908): Disabling Clipboard Support for VM. 2017-11-13 06:03:28 (14908): Disabling Drag and Drop Support for VM. 2017-11-13 06:03:28 (14908): Adding storage controller(s) to VM. 2017-11-13 06:03:28 (14908): Adding virtual disk drive to VM. (vm_image.vdi) 2017-11-13 06:03:29 (14908): Adding VirtualBox Guest Additions to VM. 2017-11-13 06:03:29 (14908): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 2017-11-13 06:03:29 (14908): forwarding host port 52773 to guest port 80 2017-11-13 06:03:29 (14908): Enabling remote desktop for VM. 2017-11-13 06:03:30 (14908): Enabling shared directory for VM. 2017-11-13 06:03:30 (14908): Starting VM using VBoxManage interface. (boinc_bca0ac41590654b4, slot#24) 2017-11-13 06:03:34 (14908): Successfully started VM. (PID = '6336') 2017-11-13 06:03:34 (14908): Reporting VM Process ID to BOINC. 2017-11-13 06:03:34 (14908): Guest Log: BIOS: VirtualBox 5.1.22 2017-11-13 06:03:34 (14908): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63 2017-11-13 06:03:34 (14908): VM state change detected. (old = 'PoweredOff', new = 'Running') 2017-11-13 06:03:34 (14908): Detected: Web Application Enabled (http://localhost:52773) 2017-11-13 06:03:34 (14908): Detected: Remote Desktop Enabled (localhost:52778) 2017-11-13 06:03:34 (14908): Preference change detected 2017-11-13 06:03:34 (14908): Setting CPU throttle for VM. (100%) 2017-11-13 06:03:34 (14908): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds)) 2017-11-13 06:03:36 (14908): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2017-11-13 06:03:36 (14908): Guest Log: BIOS: Booting from Hard Disk... 2017-11-13 06:03:38 (14908): Guest Log: BIOS: KBD: unsupported int 16h function 03 2017-11-13 06:03:38 (14908): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 2017-11-13 06:04:13 (14908): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2017-11-13 06:04:33 (14908): Guest Log: VBoxService 4.3.28 r100309 (verbosity: 0) linux.amd64 (May 13 2015 17:11:31) release log 2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000137 main Log opened 2017-11-13T12:04:33.097548000Z 2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000444 main OS Product: Linux 2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000551 main OS Release: 4.1.39-28.cernvm.x86_64 2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000613 main OS Version: #1 SMP Tue Mar 14 08:13:19 CET 2017 2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000670 main OS Service Pack: #1 SMP Tue Mar 14 08:13:19 CET 2017 2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000727 main Executable: /usr/sbin/VBoxService 2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000729 main Process ID: 2719 2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000730 main Package type: LINUX_64BITS_GENERIC 2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.009969 main 4.3.28 r100309 started. Verbose level = 0 2017-11-13 06:05:44 (14908): Guest Log: [INFO] Mounting the shared directory 2017-11-13 06:05:44 (14908): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor 2017-11-13 06:05:45 (14908): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80 2017-11-13 06:05:45 (14908): Guest Log: [DEBUG] Connection to cern.ch 80 port [tcp/http] succeeded! 2017-11-13 06:05:45 (14908): Guest Log: [DEBUG] 0 2017-11-13 06:05:45 (14908): Guest Log: [DEBUG] Testing CVMFS connection to lhchomeproxy.cern.ch on port 3125 2017-11-13 06:05:46 (14908): Guest Log: [DEBUG] Connection to lhchomeproxy.cern.ch 3125 port [tcp/a13-an] succeeded! 2017-11-13 06:05:46 (14908): Guest Log: [DEBUG] 0 2017-11-13 06:05:46 (14908): Guest Log: [DEBUG] Testing VCCS connection to vccs.cern.ch on port 443 2017-11-13 06:05:46 (14908): Guest Log: [DEBUG] Connection to vccs.cern.ch 443 port [tcp/https] succeeded! 2017-11-13 06:05:46 (14908): Guest Log: [DEBUG] 0 2017-11-13 06:05:46 (14908): Guest Log: [DEBUG] Testing connection to Condor server on port 9618 2017-11-13 06:05:47 (14908): Guest Log: [DEBUG] Connection to vccondor01.cern.ch 9618 port [tcp/condor] succeeded! 2017-11-13 06:05:47 (14908): Guest Log: [DEBUG] 0 2017-11-13 06:05:47 (14908): Guest Log: [DEBUG] Probing CVMFS ... 2017-11-13 06:05:48 (14908): Guest Log: Probing /cvmfs/grid.cern.ch... OK 2017-11-13 06:06:24 (14908): Guest Log: Probing /cvmfs/sft.cern.ch... OK 2017-11-13 06:06:25 (14908): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2017-11-13 06:06:25 (14908): Guest Log: 2.2.0.0 3423 1 19600 5072 14 1 481687 10240001 2 65024 0 20 95 20792 0 http://cvmfs.fnal.gov/cvmfs/grid.cern.ch http://131.225.205.134:3125 1 2017-11-13 06:06:34 (14908): Guest Log: [INFO] Reading volunteer information 2017-11-13 06:06:34 (14908): Guest Log: [INFO] Volunteer: marmot (373413) Host: 10506240 2017-11-13 06:06:34 (14908): Guest Log: [INFO] VMID: af6a5502-bdf8-44f3-94c4-46386c9eddf5 2017-11-13 06:06:34 (14908): Guest Log: [INFO] Requesting an X509 credential from LHC@home 2017-11-13 06:06:38 (14908): Guest Log: [INFO] Running the fast benchmark. 2017-11-13 06:09:05 (14908): Guest Log: [INFO] Machine performance 4.08 HEPSEC06 2017-11-13 06:09:05 (14908): Guest Log: [INFO] Theory application starting. Check log files. 2017-11-13 06:09:05 (14908): Guest Log: [DEBUG] HTCondor ping 2017-11-13 06:09:05 (14908): Guest Log: [DEBUG] 0 2017-11-13 06:10:27 (14908): Guest Log: [INFO] New Job Starting in slot1 2017-11-13 06:10:27 (14908): Guest Log: [INFO] Condor JobID: 1415.80 in slot1 2017-11-13 06:10:32 (14908): Guest Log: [INFO] MCPlots JobID: 39453928 in slot1 2017-11-13 06:57:14 (14908): Preference change detected 2017-11-13 06:57:14 (14908): Setting CPU throttle for VM. (100%) 2017-11-13 06:57:48 (14908): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds)) 2017-11-13 07:33:02 (14908): Preference change detected 2017-11-13 07:33:02 (14908): Setting CPU throttle for VM. (100%) 2017-11-13 07:33:34 (14908): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds)) 2017-11-13 07:34:58 (14908): Preference change detected 2017-11-13 07:34:58 (14908): Setting CPU throttle for VM. (100%) 2017-11-13 07:37:48 (14908): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds)) 2017-11-13 08:19:44 (14908): Status Report: Job Duration: '64800.000000' 2017-11-13 08:19:44 (14908): Status Report: Elapsed Time: '6013.543446' 2017-11-13 08:19:44 (14908): Status Report: CPU Time: '7182.894444' 2017-11-13 08:45:25 (14908): Guest Log: [INFO] Job finished in slot1 with 0. 2017-11-13 08:47:18 (14908): Guest Log: [INFO] New Job Starting in slot1 2017-11-13 08:47:18 (14908): Guest Log: [INFO] Condor JobID: 1440.79 in slot1 2017-11-13 08:47:24 (14908): Guest Log: [INFO] MCPlots JobID: 39457362 in slot1 2017-11-13 10:46:37 (14908): Status Report: Job Duration: '64800.000000' 2017-11-13 10:46:37 (14908): Status Report: Elapsed Time: '12013.869999' 2017-11-13 10:46:37 (14908): Status Report: CPU Time: '15124.296950' 2017-11-13 11:05:12 (14908): Guest Log: [INFO] Job finished in slot1 with 0. 2017-11-13 11:05:36 (14908): Guest Log: [INFO] New Job Starting in slot1 2017-11-13 11:05:36 (14908): Guest Log: [INFO] Condor JobID: 1453.150 in slot1 2017-11-13 11:05:36 (14908): Guest Log: [INFO] MCPlots JobID: 39459068 in slot1 2017-11-13 11:51:12 (14908): Guest Log: [INFO] Job finished in slot1 with 0. 2017-11-13 11:51:24 (14908): Guest Log: [INFO] New Job Starting in slot1 2017-11-13 11:51:24 (14908): Guest Log: [INFO] Condor JobID: 1470.85 in slot1 2017-11-13 11:51:29 (14908): Guest Log: [INFO] MCPlots JobID: 39461524 in slot1 2017-11-13 13:01:15 (14908): Status Report: Job Duration: '64800.000000' 2017-11-13 13:01:15 (14908): Status Report: Elapsed Time: '18013.902668' 2017-11-13 13:01:15 (14908): Status Report: CPU Time: '22535.639258' 2017-11-13 15:18:15 (14908): Status Report: Job Duration: '64800.000000' 2017-11-13 15:18:15 (14908): Status Report: Elapsed Time: '24014.540192' 2017-11-13 15:18:15 (14908): Status Report: CPU Time: '30256.066348' 2017-11-13 15:48:49 (14908): Guest Log: [INFO] Job finished in slot1 with 0. 2017-11-13 15:48:51 (14908): Guest Log: [INFO] New Job Starting in slot1 2017-11-13 15:48:51 (14908): Guest Log: [INFO] Condor JobID: 1503.1125 in slot1 2017-11-13 15:48:56 (14908): Guest Log: [INFO] MCPlots JobID: 39466625 in slot1 2017-11-13 18:03:54 (14908): Status Report: Job Duration: '64800.000000' 2017-11-13 18:03:54 (14908): Status Report: Elapsed Time: '30014.842810' 2017-11-13 18:03:54 (14908): Status Report: CPU Time: '38337.120949' 2017-11-13 18:14:54 (14908): Guest Log: [INFO] Job finished in slot1 with 0. 2017-11-13 18:25:23 (14908): Guest Log: [INFO] Condor exited with return value N/A. 2017-11-13 18:25:23 (14908): Guest Log: [INFO] Shutting Down. 2017-11-13 18:25:23 (14908): VM Completion File Detected. 2017-11-13 18:25:23 (14908): VM Completion Message: Condor exited with return value N/A. . 2017-11-13 18:25:23 (14908): Powering off VM. 2017-11-13 18:30:46 (14908): VM did not power off when requested. 2017-11-13 18:30:46 (14908): VM was successfully terminated. 2017-11-13 18:30:46 (14908): Deregistering VM. (boinc_bca0ac41590654b4, slot#24) 2017-11-13 18:30:46 (14908): Removing network bandwidth throttle group from VM. 2017-11-13 18:30:47 (14908): Removing storage controller(s) from VM. 2017-11-13 18:30:47 (14908): Removing VM from VirtualBox. 2017-11-13 18:30:47 (14908): Removing virtual disk drive from VirtualBox. 18:30:53 (14908): called boinc_finish(0) </stderr_txt> ]]> |
Send message Joined: 27 Sep 08 Posts: 803 Credit: 649,965,664 RAC: 240,966 |
I would say since condor exited with N/A not 0 then this is considdered and error. |
Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 |
Still getting 2 or 3 of these a day (out of about 60-90 run). They are significant because each is a loss of 40-60,000 CPU cycles each and all the work within the Theory Sims was completed and reported. What does Condor exiting with result N/A mean? |
Send message Joined: 29 Aug 05 Posts: 1005 Credit: 6,269,607 RAC: 387 |
Still getting 2 or 3 of these a day (out of about 60-90 run). I think that means that the script isn't forwarding the exit code, but I may be wrong. |
©2024 CERN