Message boards : Number crunching : Some WU's tagged computation error but logs looks good
Message board moderation

To post messages, you must log in.

AuthorMessage
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 144
Credit: 6,301,268
RAC: 0
Message 33060 - Posted: 14 Nov 2017, 1:54:26 UTC

The log appears to show the task ended successfully but gets put into the error category. Can someone else spot what I'm missing about these?

There are more than 3 but just linking 3 examples with full text of one:

Task 164085532

Task 164109292
Task 164278255:
<core_client_version>7.8.3</core_client_version>
<![CDATA[
<message>
finish file present too long</message>
<stderr_txt>
2017-11-13 06:01:52 (14908): Detected: vboxwrapper 26197
2017-11-13 06:01:52 (14908): Detected: BOINC client v7.7
2017-11-13 06:02:08 (14908): Detected: VirtualBox VboxManage Interface (Version: 5.1.22)
2017-11-13 06:02:09 (14908): Detected: Heartbeat check (file: 'heartbeat' every 1200.000000 seconds)
2017-11-13 06:02:09 (14908): Successfully copied 'init_data.xml' to the shared directory.
2017-11-13 06:02:22 (14908): Create VM. (boinc_bca0ac41590654b4, slot#24)
2017-11-13 06:02:29 (14908): Setting Memory Size for VM. (512MB)
2017-11-13 06:02:41 (14908): Setting CPU Count for VM. (1)
2017-11-13 06:02:58 (14908): Setting Chipset Options for VM.
2017-11-13 06:03:18 (14908): Setting Boot Options for VM.
2017-11-13 06:03:20 (14908): Setting Network Configuration for NAT.
2017-11-13 06:03:26 (14908): Enabling VM Network Access.
2017-11-13 06:03:26 (14908): Disabling USB Support for VM.
2017-11-13 06:03:26 (14908): Disabling COM Port Support for VM.
2017-11-13 06:03:27 (14908): Disabling LPT Port Support for VM.
2017-11-13 06:03:27 (14908): Disabling Audio Support for VM.
2017-11-13 06:03:27 (14908): Disabling Clipboard Support for VM.
2017-11-13 06:03:28 (14908): Disabling Drag and Drop Support for VM.
2017-11-13 06:03:28 (14908): Adding storage controller(s) to VM.
2017-11-13 06:03:28 (14908): Adding virtual disk drive to VM. (vm_image.vdi)
2017-11-13 06:03:29 (14908): Adding VirtualBox Guest Additions to VM.
2017-11-13 06:03:29 (14908): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2017-11-13 06:03:29 (14908): forwarding host port 52773 to guest port 80
2017-11-13 06:03:29 (14908): Enabling remote desktop for VM.
2017-11-13 06:03:30 (14908): Enabling shared directory for VM.
2017-11-13 06:03:30 (14908): Starting VM using VBoxManage interface. (boinc_bca0ac41590654b4, slot#24)
2017-11-13 06:03:34 (14908): Successfully started VM. (PID = '6336')
2017-11-13 06:03:34 (14908): Reporting VM Process ID to BOINC.
2017-11-13 06:03:34 (14908): Guest Log: BIOS: VirtualBox 5.1.22

2017-11-13 06:03:34 (14908): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63

2017-11-13 06:03:34 (14908): VM state change detected. (old = 'PoweredOff', new = 'Running')
2017-11-13 06:03:34 (14908): Detected: Web Application Enabled (http://localhost:52773)
2017-11-13 06:03:34 (14908): Detected: Remote Desktop Enabled (localhost:52778)
2017-11-13 06:03:34 (14908): Preference change detected
2017-11-13 06:03:34 (14908): Setting CPU throttle for VM. (100%)
2017-11-13 06:03:34 (14908): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds))
2017-11-13 06:03:36 (14908): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032

2017-11-13 06:03:36 (14908): Guest Log: BIOS: Booting from Hard Disk...

2017-11-13 06:03:38 (14908): Guest Log: BIOS: KBD: unsupported int 16h function 03

2017-11-13 06:03:38 (14908): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 

2017-11-13 06:04:13 (14908): Guest Log: vboxguest: misc device minor 56, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)

2017-11-13 06:04:33 (14908): Guest Log: VBoxService 4.3.28 r100309 (verbosity: 0) linux.amd64 (May 13 2015 17:11:31) release log

2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000137 main     Log opened 2017-11-13T12:04:33.097548000Z

2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000444 main     OS Product: Linux

2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000551 main     OS Release: 4.1.39-28.cernvm.x86_64

2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000613 main     OS Version: #1 SMP Tue Mar 14 08:13:19 CET 2017

2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000670 main     OS Service Pack: #1 SMP Tue Mar 14 08:13:19 CET 2017

2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000727 main     Executable: /usr/sbin/VBoxService

2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000729 main     Process ID: 2719

2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.000730 main     Package type: LINUX_64BITS_GENERIC

2017-11-13 06:04:33 (14908): Guest Log: 00:00:00.009969 main     4.3.28 r100309 started. Verbose level = 0

2017-11-13 06:05:44 (14908): Guest Log: [INFO] Mounting the shared directory

2017-11-13 06:05:44 (14908): Guest Log: [INFO] Shared directory mounted, enabling vboxmonitor

2017-11-13 06:05:45 (14908): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80

2017-11-13 06:05:45 (14908): Guest Log: [DEBUG] Connection to cern.ch 80 port [tcp/http] succeeded!

2017-11-13 06:05:45 (14908): Guest Log: [DEBUG] 0

2017-11-13 06:05:45 (14908): Guest Log: [DEBUG] Testing CVMFS connection to lhchomeproxy.cern.ch on port 3125

2017-11-13 06:05:46 (14908): Guest Log: [DEBUG] Connection to lhchomeproxy.cern.ch 3125 port [tcp/a13-an] succeeded!

2017-11-13 06:05:46 (14908): Guest Log: [DEBUG] 0

2017-11-13 06:05:46 (14908): Guest Log: [DEBUG] Testing VCCS connection to vccs.cern.ch on port 443

2017-11-13 06:05:46 (14908): Guest Log: [DEBUG] Connection to vccs.cern.ch 443 port [tcp/https] succeeded!

2017-11-13 06:05:46 (14908): Guest Log: [DEBUG] 0

2017-11-13 06:05:46 (14908): Guest Log: [DEBUG] Testing connection to Condor server on port 9618

2017-11-13 06:05:47 (14908): Guest Log: [DEBUG] Connection to vccondor01.cern.ch 9618 port [tcp/condor] succeeded!

2017-11-13 06:05:47 (14908): Guest Log: [DEBUG] 0

2017-11-13 06:05:47 (14908): Guest Log: [DEBUG] Probing CVMFS ...

2017-11-13 06:05:48 (14908): Guest Log: Probing /cvmfs/grid.cern.ch... OK

2017-11-13 06:06:24 (14908): Guest Log: Probing /cvmfs/sft.cern.ch... OK

2017-11-13 06:06:25 (14908): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE

2017-11-13 06:06:25 (14908): Guest Log: 2.2.0.0 3423 1 19600 5072 14 1 481687 10240001 2 65024 0 20 95 20792 0 http://cvmfs.fnal.gov/cvmfs/grid.cern.ch http://131.225.205.134:3125 1

2017-11-13 06:06:34 (14908): Guest Log: [INFO] Reading volunteer information

2017-11-13 06:06:34 (14908): Guest Log: [INFO] Volunteer: marmot (373413) Host: 10506240

2017-11-13 06:06:34 (14908): Guest Log: [INFO] VMID: af6a5502-bdf8-44f3-94c4-46386c9eddf5

2017-11-13 06:06:34 (14908): Guest Log: [INFO] Requesting an X509 credential from LHC@home

2017-11-13 06:06:38 (14908): Guest Log: [INFO] Running the fast benchmark.

2017-11-13 06:09:05 (14908): Guest Log: [INFO] Machine performance 4.08 HEPSEC06

2017-11-13 06:09:05 (14908): Guest Log: [INFO] Theory application starting. Check log files.

2017-11-13 06:09:05 (14908): Guest Log: [DEBUG] HTCondor ping

2017-11-13 06:09:05 (14908): Guest Log: [DEBUG] 0

2017-11-13 06:10:27 (14908): Guest Log: [INFO] New Job Starting in slot1

2017-11-13 06:10:27 (14908): Guest Log: [INFO] Condor JobID:  1415.80 in slot1

2017-11-13 06:10:32 (14908): Guest Log: [INFO] MCPlots JobID: 39453928 in slot1

2017-11-13 06:57:14 (14908): Preference change detected
2017-11-13 06:57:14 (14908): Setting CPU throttle for VM. (100%)
2017-11-13 06:57:48 (14908): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds))
2017-11-13 07:33:02 (14908): Preference change detected
2017-11-13 07:33:02 (14908): Setting CPU throttle for VM. (100%)
2017-11-13 07:33:34 (14908): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds))
2017-11-13 07:34:58 (14908): Preference change detected
2017-11-13 07:34:58 (14908): Setting CPU throttle for VM. (100%)
2017-11-13 07:37:48 (14908): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds))
2017-11-13 08:19:44 (14908): Status Report: Job Duration: '64800.000000'
2017-11-13 08:19:44 (14908): Status Report: Elapsed Time: '6013.543446'
2017-11-13 08:19:44 (14908): Status Report: CPU Time: '7182.894444'
2017-11-13 08:45:25 (14908): Guest Log: [INFO] Job finished in slot1 with 0.

2017-11-13 08:47:18 (14908): Guest Log: [INFO] New Job Starting in slot1

2017-11-13 08:47:18 (14908): Guest Log: [INFO] Condor JobID:  1440.79 in slot1

2017-11-13 08:47:24 (14908): Guest Log: [INFO] MCPlots JobID: 39457362 in slot1

2017-11-13 10:46:37 (14908): Status Report: Job Duration: '64800.000000'
2017-11-13 10:46:37 (14908): Status Report: Elapsed Time: '12013.869999'
2017-11-13 10:46:37 (14908): Status Report: CPU Time: '15124.296950'
2017-11-13 11:05:12 (14908): Guest Log: [INFO] Job finished in slot1 with 0.

2017-11-13 11:05:36 (14908): Guest Log: [INFO] New Job Starting in slot1

2017-11-13 11:05:36 (14908): Guest Log: [INFO] Condor JobID:  1453.150 in slot1

2017-11-13 11:05:36 (14908): Guest Log: [INFO] MCPlots JobID: 39459068 in slot1

2017-11-13 11:51:12 (14908): Guest Log: [INFO] Job finished in slot1 with 0.

2017-11-13 11:51:24 (14908): Guest Log: [INFO] New Job Starting in slot1

2017-11-13 11:51:24 (14908): Guest Log: [INFO] Condor JobID:  1470.85 in slot1

2017-11-13 11:51:29 (14908): Guest Log: [INFO] MCPlots JobID: 39461524 in slot1

2017-11-13 13:01:15 (14908): Status Report: Job Duration: '64800.000000'
2017-11-13 13:01:15 (14908): Status Report: Elapsed Time: '18013.902668'
2017-11-13 13:01:15 (14908): Status Report: CPU Time: '22535.639258'
2017-11-13 15:18:15 (14908): Status Report: Job Duration: '64800.000000'
2017-11-13 15:18:15 (14908): Status Report: Elapsed Time: '24014.540192'
2017-11-13 15:18:15 (14908): Status Report: CPU Time: '30256.066348'
2017-11-13 15:48:49 (14908): Guest Log: [INFO] Job finished in slot1 with 0.

2017-11-13 15:48:51 (14908): Guest Log: [INFO] New Job Starting in slot1

2017-11-13 15:48:51 (14908): Guest Log: [INFO] Condor JobID:  1503.1125 in slot1

2017-11-13 15:48:56 (14908): Guest Log: [INFO] MCPlots JobID: 39466625 in slot1

2017-11-13 18:03:54 (14908): Status Report: Job Duration: '64800.000000'
2017-11-13 18:03:54 (14908): Status Report: Elapsed Time: '30014.842810'
2017-11-13 18:03:54 (14908): Status Report: CPU Time: '38337.120949'
2017-11-13 18:14:54 (14908): Guest Log: [INFO] Job finished in slot1 with 0.

2017-11-13 18:25:23 (14908): Guest Log: [INFO] Condor exited with return value N/A.

2017-11-13 18:25:23 (14908): Guest Log: [INFO] Shutting Down.

2017-11-13 18:25:23 (14908): VM Completion File Detected.
2017-11-13 18:25:23 (14908): VM Completion Message: Condor exited with return value N/A.
.
2017-11-13 18:25:23 (14908): Powering off VM.
2017-11-13 18:30:46 (14908): VM did not power off when requested.
2017-11-13 18:30:46 (14908): VM was successfully terminated.
2017-11-13 18:30:46 (14908): Deregistering VM. (boinc_bca0ac41590654b4, slot#24)
2017-11-13 18:30:46 (14908): Removing network bandwidth throttle group from VM.
2017-11-13 18:30:47 (14908): Removing storage controller(s) from VM.
2017-11-13 18:30:47 (14908): Removing VM from VirtualBox.
2017-11-13 18:30:47 (14908): Removing virtual disk drive from VirtualBox.
18:30:53 (14908): called boinc_finish(0)

</stderr_txt>
]]>
ID: 33060 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 803
Credit: 649,965,664
RAC: 240,966
Message 33063 - Posted: 14 Nov 2017, 7:55:03 UTC

I would say since condor exited with N/A not 0 then this is considdered and error.
ID: 33063 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 144
Credit: 6,301,268
RAC: 0
Message 33167 - Posted: 30 Nov 2017, 2:52:41 UTC - in response to Message 33063.  

Still getting 2 or 3 of these a day (out of about 60-90 run).
They are significant because each is a loss of 40-60,000 CPU cycles each and all the work within the Theory Sims was completed and reported.

What does Condor exiting with result N/A mean?
ID: 33167 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1005
Credit: 6,269,607
RAC: 387
Message 33171 - Posted: 30 Nov 2017, 8:04:18 UTC - in response to Message 33167.  

Still getting 2 or 3 of these a day (out of about 60-90 run).
They are significant because each is a loss of 40-60,000 CPU cycles each and all the work within the Theory Sims was completed and reported.

What does Condor exiting with result N/A mean?

I think that means that the script isn't forwarding the exit code, but I may be wrong.
ID: 33171 · Report as offensive     Reply Quote

Message boards : Number crunching : Some WU's tagged computation error but logs looks good


©2024 CERN