Message boards :
LHCb Application :
Job finished in slot1 with unknown exit code.
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Sep 04 Posts: 455 Credit: 202,061,313 RAC: 47,670 |
48 minutes runtime, but still no CPU-Usage ? From StdOut: [quot] Output of the job wrapper may appear here. 22:56:43 +0100 2016-11-29 [INFO] New Job Starting in slot1 22:56:43 +0100 2016-11-29 [INFO] Condor JobID: 15692 in slot1 A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/testPilotCommands.py A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/sudoComputingElement.py A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/power.sh A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/ParseJobAgentLog A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/save-payload-logs A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/vm-pilot A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/gmond.conf A /var/lib/condor/execute/dir_4131/tmp.hFtCuE1BVw/vm-bootstrap Checked out revision 88383. Getting DIRAC Pilot 2.0 code from cvmfs (LHCBDIRAC=v8r4p5; DIRAC=v6r15p18) 22:56:49 +0100 2016-11-29 [INFO] Starting pilot in slot1 23:15:54 +0100 2016-11-29 [INFO] Job finished in slot1 with unknown exit code. 23:16:51 +0100 2016-11-29 [INFO] New Job Starting in slot1 23:16:51 +0100 2016-11-29 [INFO] Condor JobID: 16049 in slot1 A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/testPilotCommands.py A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/sudoComputingElement.py A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/power.sh A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/ParseJobAgentLog A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/save-payload-logs A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/vm-pilot A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/gmond.conf A /var/lib/condor/execute/dir_4428/tmp.MG81LxMNfH/vm-bootstrap Checked out revision 88383. Getting DIRAC Pilot 2.0 code from cvmfs (LHCBDIRAC=v8r4p5; DIRAC=v6r15p18) 23:16:56 +0100 2016-11-29 [INFO] Starting pilot in slot1 [/quote] Is this normal ? And from StartLog: [quote] 11/29/16 22:51:43 ****************************************************** 11/29/16 22:51:43 ** condor_startd (CONDOR_STARTD) STARTING UP 11/29/16 22:51:43 ** /usr/sbin/condor_startd 11/29/16 22:51:43 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1) 11/29/16 22:51:43 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON 11/29/16 22:51:43 ** $CondorVersion: 8.4.8 Jun 30 2016 BuildID: 373513 $ 11/29/16 22:51:43 ** $CondorPlatform: x86_64_RedHat6 $ 11/29/16 22:51:43 ** PID = 4091 11/29/16 22:51:43 ** Log last touched time unavailable (No such file or directory) 11/29/16 22:51:43 ****************************************************** 11/29/16 22:51:43 Using config source: /etc/condor/condor_config 11/29/16 22:51:43 Using local config sources: 11/29/16 22:51:43 /etc/condor/config.d/10_security.config 11/29/16 22:51:43 /etc/condor/config.d/14_network.config 11/29/16 22:51:43 /etc/condor/config.d/20_workernode.config 11/29/16 22:51:43 /etc/condor/config.d/30_lease.config 11/29/16 22:51:43 /etc/condor/config.d/35_lhcb.config 11/29/16 22:51:43 /etc/condor/config.d/40_ccb.config 11/29/16 22:51:43 /etc/condor/condor_config.local 11/29/16 22:51:43 config Macros = 147, Sorted = 147, StringBytes = 5238, TablesBytes = 5388 11/29/16 22:51:43 CLASSAD_CACHING is ENABLED 11/29/16 22:51:43 Daemon Log is logging: D_ALWAYS D_ERROR 11/29/16 22:51:43 Daemoncore: Listening at <10.0.2.15:59875> on TCP (ReliSock). 11/29/16 22:51:43 DaemonCore: command socket at <10.0.2.15:59875?addrs=10.0.2.15-59875&noUDP> 11/29/16 22:51:43 DaemonCore: private command socket at <10.0.2.15:59875?addrs=10.0.2.15-59875> 11/29/16 22:51:46 CCBListener: registered with CCB server vccondor01.cern.ch as ccbid 128.142.142.167:9618?addrs=128.142.142.167-9618+[2001-1458-301-98--100-99]-9618#632368 11/29/16 22:51:46 HibernationSupportedStates invalid '' in ad from hibernation plugin /usr/libexec/condor/condor_power_state 11/29/16 22:51:46 VM-gahp server reported an internal error 11/29/16 22:51:46 VM universe will be tested to check if it is available 11/29/16 22:51:46 History file rotation is enabled. 11/29/16 22:51:46 Maximum history file size is: 20971520 bytes 11/29/16 22:51:46 Number of rotated history files is: 2 11/29/16 22:51:46 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto slot type 0: Cpus: 1.000000, Memory: 3000, Swap: 100.00%, Disk: 100.00% 11/29/16 22:51:46 New machine resource allocated 11/29/16 22:51:46 Setting up slot pairings 11/29/16 22:51:46 CronJobList: Adding job 'mips' 11/29/16 22:51:46 CronJobList: Adding job 'kflops' 11/29/16 22:51:46 CronJob: Initializing job 'mips' (/usr/libexec/condor/condor_mips) 11/29/16 22:51:46 CronJob: Initializing job 'kflops' (/usr/libexec/condor/condor_kflops) 11/29/16 22:51:46 State change: IS_OWNER is false 11/29/16 22:51:46 Changing state: Owner -> Unclaimed 11/29/16 22:51:46 State change: RunBenchmarks is TRUE 11/29/16 22:51:46 Changing activity: Idle -> Benchmarking 11/29/16 22:51:46 BenchMgr:StartBenchmarks() 11/29/16 22:52:12 State change: benchmarks completed 11/29/16 22:52:12 Changing activity: Benchmarking -> Idle 11/29/16 22:56:28 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 442 (REQUEST_CLAIM), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 188.184.94.254,vccondorce02.cern.ch, hostname size = 1, original ip address = 188.184.94.254 11/29/16 22:56:28 Request accepted. 11/29/16 22:56:28 Remote owner is lhcbpilot@cern.ch 11/29/16 22:56:28 State change: claiming protocol successful 11/29/16 22:56:28 Changing state: Unclaimed -> Claimed 11/29/16 22:56:41 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 444 (ACTIVATE_CLAIM), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason 11/29/16 22:56:41 Got activate_claim request from shadow (188.184.94.254) 11/29/16 22:56:41 Remote job ID is 15692.0 11/29/16 22:56:41 Got universe "VANILLA" (5) from request classad 11/29/16 22:56:41 State change: claim-activation protocol successful 11/29/16 22:56:41 Changing activity: Idle -> Busy 11/29/16 23:15:54 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 404 (DEACTIVATE_CLAIM_FORCIBLY), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason 11/29/16 23:15:54 Called deactivate_claim_forcibly() 11/29/16 23:15:54 Starter pid 4131 exited with status 0 11/29/16 23:15:54 State change: starter exited 11/29/16 23:15:54 Changing activity: Busy -> Idle 11/29/16 23:15:55 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 443 (RELEASE_CLAIM), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason 11/29/16 23:15:55 State change: received RELEASE_CLAIM command 11/29/16 23:15:55 Changing state and activity: Claimed/Idle -> Preempting/Vacating 11/29/16 23:15:55 State change: No preempting claim, returning to owner 11/29/16 23:15:55 Changing state and activity: Preempting/Vacating -> Owner/Idle 11/29/16 23:15:55 State change: IS_OWNER is false 11/29/16 23:15:55 Changing state: Owner -> Unclaimed 11/29/16 23:16:40 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 442 (REQUEST_CLAIM), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason 11/29/16 23:16:40 Request accepted. 11/29/16 23:16:40 Remote owner is lhcbpilot@cern.ch 11/29/16 23:16:40 State change: claiming protocol successful 11/29/16 23:16:40 Changing state: Unclaimed -> Claimed 11/29/16 23:16:49 PERMISSION DENIED to submit-side@matchsession from host 188.184.94.254 for command 444 (ACTIVATE_CLAIM), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason 11/29/16 23:16:49 Got activate_claim request from shadow (188.184.94.254) 11/29/16 23:16:49 Remote job ID is 16049.0 11/29/16 23:16:49 Got universe "VANILLA" (5) from request classad 11/29/16 23:16:49 State change: claim-activation protocol successful 11/29/16 23:16:49 Changing activity: Idle -> Busy Supporting BOINC, a great concept ! |
Send message Joined: 20 Jun 14 Posts: 381 Credit: 238,712 RAC: 0 |
48 minutes runtime, but still no CPU-Usage ? There are a few issues with the LHCb application at the moment that we are trying to iron out which is why it is still in beta. There have been a few periods where the job stream dried up and the automatic shutdown of the VM is not fully operational.
Yes, the retrieval of the exit code for LHCb jobs still needs to be implemented.
This is just noise but should be cleaned up. |
©2025 CERN