Message boards : LHCb Application : Job finished in slot1 with unknown exit code.
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 406
Credit: 96,116,916
RAC: 115
Message 28002 - Posted: 29 Nov 2016, 22:37:39 UTC

48 minutes runtime, but still no CPU-Usage ?

From StdOut:
[quot]
Output of the job wrapper may appear here.
22:56:43 +0100 2016-11-29 [INFO] New Job Starting in slot1
22:56:43 +0100 2016-11-29 [INFO] Condor JobID: 15692 in slot1
A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/testPilotCommands.py
A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/sudoComputingElement.py
A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/power.sh
A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/ParseJobAgentLog
A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/save-payload-logs
A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/vm-pilot
A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/gmond.conf
A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/vm-bootstrap
Checked out revision 88383.
Getting DIRAC Pilot 2.0 code from cvmfs (LHCBDIRAC=v8r4p5; DIRAC=v6r15p18)
22:56:49 +0100 2016-11-29 [INFO] Starting pilot in slot1
23:15:54 +0100 2016-11-29 [INFO] Job finished in slot1 with unknown exit code.
23:16:51 +0100 2016-11-29 [INFO] New Job Starting in slot1
23:16:51 +0100 2016-11-29 [INFO] Condor JobID: 16049 in slot1
A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/testPilotCommands.py
A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/sudoComputingElement.py
A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/power.sh
A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/ParseJobAgentLog
A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/save-payload-logs
A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/vm-pilot
A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/gmond.conf
A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/vm-bootstrap
Checked out revision 88383.
Getting DIRAC Pilot 2.0 code from cvmfs (LHCBDIRAC=v8r4p5; DIRAC=v6r15p18)
23:16:56 +0100 2016-11-29 [INFO] Starting pilot in slot1
[/quote]

Is this normal ?

And from StartLog:
[quote]
11/29/16 22:51:43 ******************************************************
11/29/16 22:51:43 ** condor_startd (CONDOR_STARTD) STARTING UP
11/29/16 22:51:43 ** /usr/sbin/condor_startd
11/29/16 22:51:43 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)
11/29/16 22:51:43 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
11/29/16 22:51:43 ** $CondorVersion: 8.4.8 Jun 30 2016 BuildID: 373513 $
11/29/16 22:51:43 ** $CondorPlatform: x86_64_RedHat6 $
11/29/16 22:51:43 ** PID = 4091
11/29/16 22:51:43 ** Log last touched time unavailable (No such file or directory)
11/29/16 22:51:43 ******************************************************
11/29/16 22:51:43 Using config source: /etc/condor/condor_config
11/29/16 22:51:43 Using local config sources:
11/29/16 22:51:43 /etc/condor/config.d/10_security.config
11/29/16 22:51:43 /etc/condor/config.d/14_network.config
11/29/16 22:51:43 /etc/condor/config.d/20_workernode.config
11/29/16 22:51:43 /etc/condor/config.d/30_lease.config
11/29/16 22:51:43 /etc/condor/config.d/35_lhcb.config
11/29/16 22:51:43 /etc/condor/config.d/40_ccb.config
11/29/16 22:51:43 /etc/condor/condor_config.local
11/29/16 22:51:43 config Macros = 147, Sorted = 147, StringBytes = 5238, TablesBytes = 5388
11/29/16 22:51:43 CLASSAD_CACHING is ENABLED
11/29/16 22:51:43 Daemon Log is logging: D_ALWAYS D_ERROR
11/29/16 22:51:43 Daemoncore: Listening at <10.0.2.15:59875> on TCP (ReliSock).
11/29/16 22:51:43 DaemonCore: command socket at <10.0.2.15:59875?addrs=10.0.2.15-59875&noUDP>
11/29/16 22:51:43 DaemonCore: private command socket at <10.0.2.15:59875?addrs=10.0.2.15-59875>
11/29/16 22:51:46 CCBListener: registered with CCB server vccondor01.cern.ch as ccbid 128.142.142.167:9618?addrs=128.142.142.167-9618+[2001-1458-301-98--100-99]-9618#632368
11/29/16 22:51:46 HibernationSupportedStates invalid '' in ad from hibernation plugin /usr/libexec/condor/condor_power_state
11/29/16 22:51:46 VM-gahp server reported an internal error
11/29/16 22:51:46 VM universe will be tested to check if it is available
11/29/16 22:51:46 History file rotation is enabled.
11/29/16 22:51:46 Maximum history file size is: 20971520 bytes
11/29/16 22:51:46 Number of rotated history files is: 2
11/29/16 22:51:46 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto
slot type 0: Cpus: 1.000000, Memory: 3000, Swap: 100.00%, Disk: 100.00%
11/29/16 22:51:46 New machine resource allocated
11/29/16 22:51:46 Setting up slot pairings
11/29/16 22:51:46 CronJobList: Adding job 'mips'
11/29/16 22:51:46 CronJobList: Adding job 'kflops'
11/29/16 22:51:46 CronJob: Initializing job 'mips' (/usr/libexec/condor/condor_mips)
11/29/16 22:51:46 CronJob: Initializing job 'kflops' (/usr/libexec/condor/condor_kflops)
11/29/16 22:51:46 State change: IS_OWNER is false
11/29/16 22:51:46 Changing state: Owner -> Unclaimed
11/29/16 22:51:46 State change: RunBenchmarks is TRUE
11/29/16 22:51:46 Changing activity: Idle -> Benchmarking
11/29/16 22:51:46 BenchMgr:StartBenchmarks()
11/29/16 22:52:12 State change: benchmarks completed
11/29/16 22:52:12 Changing activity: Benchmarking -> Idle
11/29/16 22:56:28 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 442 (REQUEST_CLAIM), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 188.184.94.254,vccondorce02.cern.ch, hostname size = 1, original ip address = 188.184.94.254
11/29/16 22:56:28 Request accepted.
11/29/16 22:56:28 Remote owner is lhcbpilot@cern.ch
11/29/16 22:56:28 State change: claiming protocol successful
11/29/16 22:56:28 Changing state: Unclaimed -> Claimed
11/29/16 22:56:41 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 444 (ACTIVATE_CLAIM), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
11/29/16 22:56:41 Got activate_claim request from shadow (188.184.94.254)
11/29/16 22:56:41 Remote job ID is 15692.0
11/29/16 22:56:41 Got universe "VANILLA" (5) from request classad
11/29/16 22:56:41 State change: claim-activation protocol successful
11/29/16 22:56:41 Changing activity: Idle -> Busy
11/29/16 23:15:54 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 404 (DEACTIVATE_CLAIM_FORCIBLY), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
11/29/16 23:15:54 Called deactivate_claim_forcibly()

11/29/16 23:15:54 Starter pid 4131 exited with status 0
11/29/16 23:15:54 State change: starter exited
11/29/16 23:15:54 Changing activity: Busy -> Idle
11/29/16 23:15:55 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 443 (RELEASE_CLAIM), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
11/29/16 23:15:55 State change: received RELEASE_CLAIM command
11/29/16 23:15:55 Changing state and activity: Claimed/Idle -> Preempting/Vacating
11/29/16 23:15:55 State change: No preempting claim, returning to owner
11/29/16 23:15:55 Changing state and activity: Preempting/Vacating -> Owner/Idle
11/29/16 23:15:55 State change: IS_OWNER is false
11/29/16 23:15:55 Changing state: Owner -> Unclaimed
11/29/16 23:16:40 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 442 (REQUEST_CLAIM), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
11/29/16 23:16:40 Request accepted.
11/29/16 23:16:40 Remote owner is lhcbpilot@cern.ch
11/29/16 23:16:40 State change: claiming protocol successful
11/29/16 23:16:40 Changing state: Unclaimed -> Claimed
11/29/16 23:16:49 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 444 (ACTIVATE_CLAIM), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason
11/29/16 23:16:49 Got activate_claim request from shadow (188.184.94.254)
11/29/16 23:16:49 Remote job ID is 16049.0
11/29/16 23:16:49 Got universe "VANILLA" (5) from request classad
11/29/16 23:16:49 State change: claim-activation protocol successful
11/29/16 23:16:49 Changing activity: Idle -> Busy



Supporting BOINC, a great concept !
ID: 28002 · Report as offensive     Reply Quote
Profile Laurence
Project administrator
Project developer

Send message
Joined: 20 Jun 14
Posts: 335
Credit: 237,918
RAC: 0
Message 28009 - Posted: 30 Nov 2016, 9:36:52 UTC - in response to Message 28002.  

48 minutes runtime, but still no CPU-Usage ?


There are a few issues with the LHCb application at the moment that we are trying to iron out which is why it is still in beta. There have been a few periods where the job stream dried up and the automatic shutdown of the VM is not fully operational.


23:15:54 +0100 2016-11-29 [INFO] Job finished in slot1 with unknown exit code.
Is this normal ?


Yes, the retrieval of the exit code for LHCb jobs still needs to be implemented.


11/29/16 22:56:28 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 442 (REQUEST_CLAIM), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 188.184.94.254,vccondorce02.cern.ch, hostname size = 1, original ip address = 188.184.94.254


This is just noise but should be cleaned up.
ID: 28009 · Report as offensive     Reply Quote

Message boards : LHCb Application : Job finished in slot1 with unknown exit code.


©2020 CERN