Name M01LDmslJ58nsSi4ap6QjLDmwznN0nGgGQJmIVkRDmvpFKDmRCOdNo_0
Workunit 238743177
Created 28 Jan 2026, 12:33:47 UTC
Sent 28 Jan 2026, 14:18:21 UTC
Report deadline 5 Feb 2026, 14:18:21 UTC
Received 28 Jan 2026, 22:52:24 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 10698193
Run time 58 min 15 sec
CPU time 3 hours 36 min 22 sec
Priority 28
Validate state Invalid
Credit 0.00
Device peak FLOPS 39.76 GFLOPS
Application version ATLAS Simulation v3.01 (native_mt)
x86_64-pc-linux-gnu
Peak working set size 1.91 GB
Peak swap size 2.84 GB
Peak disk usage 1.15 GB

Stderr output

<core_client_version>8.2.6</core_client_version>
<![CDATA[
<stderr_txt>
23:52:09 (4002380): wrapper (7.7.26015): starting
23:52:09 (4002380): wrapper: running run_atlas (--nthreads 4)
[2026-01-28 23:52:09] Arguments: --nthreads 4
[2026-01-28 23:52:09] Threads: 4
[2026-01-28 23:52:09] Checking for CVMFS
[2026-01-28 23:52:09] Probing /cvmfs/atlas.cern.ch... OK
[2026-01-28 23:52:09] Probing /cvmfs/atlas-condb.cern.ch... OK
[2026-01-28 23:52:09] Running cvmfs_config stat atlas.cern.ch
[2026-01-28 23:52:09] VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
[2026-01-28 23:52:09] 2.13.3.0 12410 4362 47164 155626 0 50 21903464 33554433 0 130560 0 2407021 99.987 152787 200 http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch http://127.0.0.1:25468 1
[2026-01-28 23:52:09] CVMFS is ok
[2026-01-28 23:52:09] Using apptainer image /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7
[2026-01-28 23:52:09] Checking for apptainer binary...
[2026-01-28 23:52:09] Using apptainer found in PATH at /usr/bin/apptainer
[2026-01-28 23:52:09] Running /usr/bin/apptainer --version
[2026-01-28 23:52:09] apptainer version 1.4.5
[2026-01-28 23:52:09] Checking apptainer works with /usr/bin/apptainer exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 hostname
[2026-01-28 23:52:09] INFO: /etc/singularity/ exists; cleanup by system administrator is not complete (see https://apptainer.org/docs/admin/latest/singularity_migration.html) Evangelos-Katikos
[2026-01-28 23:52:09] apptainer works
[2026-01-28 23:52:09] Set ATHENA_PROC_NUMBER=4
[2026-01-28 23:52:09] Set ATHENA_CORE_NUMBER=4
[2026-01-28 23:52:09] Starting ATLAS job with PandaID=6985339801
[2026-01-28 23:52:09] Running command: /usr/bin/apptainer exec -B /cvmfs,/home/ek/BOINC/slots/1 /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 sh start_atlas.sh
[2026-01-29 00:50:21]  *** The last 200 lines of the pilot log: ***
[2026-01-29 00:50:21] 2026-01-28 22:48:41,196 | INFO     | monitor loop #259: job 0:6985339801 is in state 'running'
[2026-01-29 00:50:21] 2026-01-28 22:48:41,197 | WARNING  | exception caught: [Errno 116] Stale file handle: b'/proc'
[2026-01-29 00:50:21] 2026-01-28 22:48:41,880 | WARNING  | exception caught (1) in write_json: [Errno 116] Stale file handle: '/home/ek/BOINC/slots/1/pilot_heartbeat.json'
[2026-01-29 00:50:21] 2026-01-28 22:48:41,880 | WARNING  | failed to update heartbeat file: /home/ek/BOINC/slots/1/pilot_heartbeat.json
[2026-01-29 00:50:21] 2026-01-28 22:48:42,881 | INFO     | time since job start (3384s) is within the limit (349056.0s)
[2026-01-29 00:50:21] 2026-01-28 22:48:43,699 | INFO     | monitor loop #260: job 0:6985339801 is in state 'running'
[2026-01-29 00:50:21] 2026-01-28 22:48:43,700 | WARNING  | exception caught: [Errno 116] Stale file handle: b'/proc'
[2026-01-29 00:50:21] 2026-01-28 22:48:43,886 | WARNING  | exception caught (1) in write_json: [Errno 116] Stale file handle: '/home/ek/BOINC/slots/1/pilot_heartbeat.json'
[2026-01-29 00:50:21] 2026-01-28 22:48:43,886 | WARNING  | failed to update heartbeat file: /home/ek/BOINC/slots/1/pilot_heartbeat.json
[2026-01-29 00:50:21] 2026-01-28 22:48:44,886 | INFO     | time since job start (3386s) is within the limit (349056.0s)
[2026-01-29 00:50:21] 2026-01-28 22:48:44,886 | INFO     | 3390s have passed since pilot start - server update state is 'RUNNING'
[2026-01-29 00:50:21] 2026-01-28 22:48:45,892 | WARNING  | exception caught (1) in write_json: [Errno 116] Stale file handle: '/home/ek/BOINC/slots/1/pilot_heartbeat.json'
[2026-01-29 00:50:21] 2026-01-28 22:48:45,892 | WARNING  | failed to update heartbeat file: /home/ek/BOINC/slots/1/pilot_heartbeat.json
[2026-01-29 00:50:21] 2026-01-28 22:48:46,202 | INFO     | monitor loop #261: job 0:6985339801 is in state 'running'
[2026-01-29 00:50:21] 2026-01-28 22:48:46,203 | WARNING  | exception caught: [Errno 116] Stale file handle: b'/proc'
[2026-01-29 00:50:21] 2026-01-28 22:48:46,892 | INFO     | time since job start (3388s) is within the limit (349056.0s)
[2026-01-29 00:50:21] 2026-01-28 22:48:47,897 | WARNING  | exception caught (1) in write_json: [Errno 116] Stale file handle: '/home/ek/BOINC/slots/1/pilot_heartbeat.json'
[2026-01-29 00:50:21] 2026-01-28 22:48:47,897 | WARNING  | failed to update heartbeat file: /home/ek/BOINC/slots/1/pilot_heartbeat.json
[2026-01-29 00:50:21] 2026-01-28 22:48:48,705 | INFO     | monitor loop #262: job 0:6985339801 is in state 'running'
[2026-01-29 00:50:21] 2026-01-28 22:48:48,705 | WARNING  | exception caught: [Errno 116] Stale file handle: b'/proc'
[2026-01-29 00:50:21] 2026-01-28 22:48:48,898 | INFO     | time since job start (3390s) is within the limit (349056.0s)
[2026-01-29 00:50:21]   File "/home/ek/BOINC/slots/1/pilot3/pilot/common/exception.py", line 466, in run
[2026-01-29 00:50:21]   File "/home/ek/BOINC/slots/1/pilot3/pilot/control/monitor.py", line 246, in control
[2026-01-29 00:50:21] monitor: exception caught: [Errno 116] Stale file handle: b'/proc'
[2026-01-29 00:50:21] unexpected exception caught by thread run() function: (<class 'pilot.common.exception.PilotException'>, PilotException(OSError(116, 'Stale file handle')), <traceback object at 0x7fcf8e26b640>)
[2026-01-29 00:50:21] Traceback (most recent call last):
[2026-01-29 00:50:21]   File "/home/ek/BOINC/slots/1/pilot3/pilot/control/monitor.py", line 221, in control
[2026-01-29 00:50:21]   File "/home/ek/BOINC/slots/1/pilot3/pilot/util/psutils.py", line 398, in get_process_info
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/__init__.py", line 1499, in process_iter
[2026-01-29 00:50:21]     a = set(pids())
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/__init__.py", line 1443, in pids
[2026-01-29 00:50:21]     ret = sorted(_psplatform.pids())
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/_pslinux.py", line 1652, in pids
[2026-01-29 00:50:21]     return [int(x) for x in os.listdir(b(get_procfs_path())) if x.isdigit()]
[2026-01-29 00:50:21] OSError: [Errno 116] Stale file handle: b'/proc'
[2026-01-29 00:50:21] 
[2026-01-29 00:50:21] The above exception was the direct cause of the following exception:
[2026-01-29 00:50:21] 
[2026-01-29 00:50:21] Traceback (most recent call last):
[2026-01-29 00:50:21]   File "/home/ek/BOINC/slots/1/pilot3/pilot/common/exception.py", line 466, in run
[2026-01-29 00:50:21]   File "/home/ek/BOINC/slots/1/pilot3/pilot/control/monitor.py", line 246, in control
[2026-01-29 00:50:21] pilot.common.exception.PilotException: error code: 1301, message: An unknown pilot exception has occurred
[2026-01-29 00:50:21] details: [Errno 116] Stale file handle: b'/proc'
[2026-01-29 00:50:21] 
[2026-01-29 00:50:21] None
[2026-01-29 00:50:21] exception has been put in bucket queue belonging to thread 'monitor'
[2026-01-29 00:50:21] setting graceful stop in 10 s since there is no point in continuing
[2026-01-29 00:50:21] 2026-01-28 22:48:51,208 | INFO     | monitor loop #263: job 0:6985339801 is in state 'running'
[2026-01-29 00:50:21] 2026-01-28 22:48:51,208 | WARNING  | exception caught: [Errno 116] Stale file handle: b'/proc'
[2026-01-29 00:50:21] received exception from bucket queue in generic workflow: error code: 1301, message: An unknown pilot exception has occurred
[2026-01-29 00:50:21] details: [Errno 116] Stale file handle: b'/proc'
[2026-01-29 00:50:21] 2026-01-28 22:48:53,710 | INFO     | monitor loop #264: job 0:6985339801 is in state 'running'
[2026-01-29 00:50:21] 2026-01-28 22:48:53,710 | WARNING  | exception caught: [Errno 116] Stale file handle: b'/proc'
[2026-01-29 00:50:21] 2026-01-28 22:48:56,213 | INFO     | monitor loop #265: job 0:6985339801 is in state 'running'
[2026-01-29 00:50:21] 2026-01-28 22:48:56,213 | WARNING  | exception caught: [Errno 116] Stale file handle: b'/proc'
[2026-01-29 00:50:21] 2026-01-28 22:48:58,716 | INFO     | monitor loop #266: job 0:6985339801 is in state 'running'
[2026-01-29 00:50:21] 2026-01-28 22:48:58,716 | WARNING  | exception caught: [Errno 116] Stale file handle: b'/proc'
[2026-01-29 00:50:21] 2026-01-28 22:48:59,954 | WARNING  | data:copytool_out:received graceful stop - abort after this iteration
[2026-01-29 00:50:21] 2026-01-28 22:48:59,954 | WARNING  | job:queue_monitor:received graceful stop - abort after this iteration
[2026-01-29 00:50:21] 2026-01-28 22:48:59,954 | WARNING  | job:job_monitor:received graceful stop - abort after this iteration
[2026-01-29 00:50:21] 2026-01-28 22:48:59,954 | WARNING  | since job:queue_monitor is responsible for sending job updates, we sleep for 20 s
[2026-01-29 00:50:21] 2026-01-28 22:48:59,954 | INFO     | aborting loop
[2026-01-29 00:50:21] 2026-01-28 22:49:00,564 | INFO     | all data control threads have been joined
[2026-01-29 00:50:21] 2026-01-28 22:49:00,812 | INFO     | breaking -- sending SIGTERM to pid=4010878
[2026-01-29 00:50:21] 2026-01-28 22:49:00,813 | INFO     | breaking -- sleep 10 s before sending SIGKILL pid=4010878
[2026-01-29 00:50:21] 2026-01-28 22:49:00,960 | INFO     | [job] job monitor thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:00,964 | INFO     | all payload control threads have been joined
[2026-01-29 00:50:21] 2026-01-28 22:49:01,057 | INFO     | [job] validate thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:01,059 | INFO     | [job] retrieve thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:01,160 | INFO     | [job] create_data_payload thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:01,184 | INFO     | [payload] validate_pre thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:01,312 | INFO     | [data] copytool_in thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:01,569 | INFO     | [data] control thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:01,626 | INFO     | all job control threads have been joined
[2026-01-29 00:50:21] 2026-01-28 22:49:01,927 | INFO     | [payload] failed_post thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:01,929 | INFO     | [payload] validate_post thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:01,959 | INFO     | [data] copytool_out thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:01,969 | INFO     | [payload] control thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:02,355 | WARNING  | data:queue_monitoring:received graceful stop - abort after this iteration
[2026-01-29 00:50:21] 2026-01-28 22:49:02,632 | INFO     | [job] control thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:06,361 | INFO     | [data] queue_monitor thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:08,780 | INFO     | job.realtimelogging is not enabled
[2026-01-29 00:50:21] 2026-01-28 22:49:09,785 | INFO     | [payload] run_realtimelog thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:10,863 | INFO     | 
[2026-01-29 00:50:21] 
[2026-01-29 00:50:21] finished pid=4010878 exit_code=None state=failed
[2026-01-29 00:50:21] 
[2026-01-29 00:50:21] 2026-01-28 22:49:10,863 | WARNING  | detected unset exit_code from wait_graceful - reset to -1
[2026-01-29 00:50:21] 2026-01-28 22:49:10,863 | INFO     | using pid=4014710 to kill prmon
[2026-01-29 00:50:21] 2026-01-28 22:49:10,863 | INFO     | stopping utility process 'MemoryMonitor' with signal 10
[2026-01-29 00:50:21] 2026-01-28 22:49:10,863 | WARNING  | Error sending signal to/waiting for process 4014710: [Errno 3] No such process
[2026-01-29 00:50:21] 2026-01-28 22:49:10,863 | INFO     | utility process 4010881 cleanup finished with status=None
[2026-01-29 00:50:21] 2026-01-28 22:49:10,863 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#0/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:13,878 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#1/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:16,894 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#2/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:19,909 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#3/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:22,910 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#4/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:23,050 | INFO     | [job] queue monitor thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:49:23,219 | INFO     | [monitor] cgroup control has ended
[2026-01-29 00:50:21] 2026-01-28 22:49:23,887 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:25,897 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:25,925 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#5/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:27,903 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:28,940 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#6/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:29,913 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:31,923 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:31,955 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#7/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:33,934 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:34,970 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#8/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:35,943 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:37,953 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:37,985 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#9/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:39,964 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:41,000 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#10/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:41,974 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:43,984 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:44,006 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#11/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:45,995 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:47,021 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#12/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:48,003 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:50,013 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:50,036 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#13/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:52,023 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:53,051 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#14/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:54,034 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:56,044 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:56,066 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#15/#20)
[2026-01-29 00:50:21] 2026-01-28 22:49:58,054 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:49:59,082 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#16/#20)
[2026-01-29 00:50:21] 2026-01-28 22:50:00,062 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:50:02,072 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:50:02,089 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#17/#20)
[2026-01-29 00:50:21] 2026-01-28 22:50:04,082 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:50:05,104 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#18/#20)
[2026-01-29 00:50:21] 2026-01-28 22:50:06,092 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:50:08,103 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:50:08,119 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#19/#20)
[2026-01-29 00:50:21] 2026-01-28 22:50:10,109 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:50:11,134 | INFO     | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#20/#20)
[2026-01-29 00:50:21] 2026-01-28 22:50:12,119 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:50:14,130 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:50:14,149 | WARNING  | file copy failure: path does not exist: /home/ek/BOINC/slots/1/PanDA_Pilot-6985339801/memory_monitor_summary.json
[2026-01-29 00:50:21] 2026-01-28 22:50:14,150 | WARNING  | failed to copy memory monitor output: error code: 1103, message: No such file or directory
[2026-01-29 00:50:21] details: ('file copy failure: path does not exist: /home/ek/BOINC/slots/1/PanDA_Pilot-6985339801/memory_monitor_summary.json',)
[2026-01-29 00:50:21] 2026-01-28 22:50:14,150 | CRITICAL | execute payloads caught an exception (cannot recover): module 'psutil' has no attribute 'FileNotFoundError', Traceback (most recent call last):
[2026-01-29 00:50:21]   File "/home/ek/BOINC/slots/1/pilot3/pilot/util/psutils.py", line 340, in find_lingering_processes
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/__init__.py", line 319, in __init__
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/__init__.py", line 355, in _init
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/__init__.py", line 757, in create_time
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/_pslinux.py", line 1717, in wrapper
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/_pslinux.py", line 1948, in create_time
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/_pslinux.py", line 1717, in wrapper
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/_common.py", line 508, in wrapper
[2026-01-29 00:50:21]   File "<string>", line 3, in raise_from
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/_common.py", line 506, in wrapper
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/_pslinux.py", line 1780, in _parse_stat_file
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/_common.py", line 851, in bcat
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/_common.py", line 839, in cat
[2026-01-29 00:50:21]   File "/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/psutil/6.0.0-x86_64-centos7/lib/python3.9/site-packages/psutil/_common.py", line 799, in open_binary
[2026-01-29 00:50:21] OSError: [Errno 116] Stale file handle: '/proc/4006197/stat'
[2026-01-29 00:50:21] 
[2026-01-29 00:50:21] During handling of the above exception, another exception occurred:
[2026-01-29 00:50:21] 
[2026-01-29 00:50:21] Traceback (most recent call last):
[2026-01-29 00:50:21]   File "/home/ek/BOINC/slots/1/pilot3/pilot/control/payload.py", line 272, in execute_payloads
[2026-01-29 00:50:21]   File "/home/ek/BOINC/slots/1/pilot3/pilot/control/payloads/generic.py", line 1029, in run
[2026-01-29 00:50:21]   File "/home/ek/BOINC/slots/1/pilot3/pilot/util/psutils.py", line 347, in find_lingering_processes
[2026-01-29 00:50:21] AttributeError: module 'psutil' has no attribute 'FileNotFoundError'
[2026-01-29 00:50:21] 
[2026-01-29 00:50:21] 2026-01-28 22:50:15,182 | INFO     | waiting for thread to finish: ['<_MainThread(MainThread, started 140529506600768)>', '<ExcThread(execute_payloads, started 140529006257920)>']
[2026-01-29 00:50:21] 2026-01-28 22:50:15,182 | INFO     | [payload] execute_payloads thread has finished
[2026-01-29 00:50:21] 2026-01-28 22:50:16,140 | INFO     | caller=run is remaining thread - safe to abort (names=['<_MainThread(MainThread, started 140529506600768)>'])
[2026-01-29 00:50:21] 2026-01-28 22:50:21,150 | INFO     | all workflow threads have been joined
[2026-01-29 00:50:21] 2026-01-28 22:50:21,150 | INFO     | end of generic workflow (traces error code: 0)
[2026-01-29 00:50:21] 2026-01-28 22:50:21,151 | INFO     | traces error code: 0
[2026-01-29 00:50:21] 2026-01-28 22:50:21,151 | INFO     | pilot has finished (exit code=0, shell exit code=0)
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 15: date: command not found
[2026-01-29 00:50:21]  ==== pilot stdout END ====
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 15: date: command not found
[2026-01-29 00:50:21]  ==== wrapper stdout RESUME ====
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 15: date: command not found
[2026-01-29 00:50:21]  pilotpid: 4006197
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 15: date: command not found
[2026-01-29 00:50:21]  Pilot exit status: 0
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 15: date: command not found
[2026-01-29 00:50:21]  File not found: /home/ek/BOINC/slots/1/pilot3/pandaIDs.out, no payload
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 10: date: command not found
[2026-01-29 00:50:21]  File not found: /home/ek/BOINC/slots/1/pilot3/pandaIDs.out, no payload
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 27: ps: command not found
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 15: date: command not found
[2026-01-29 00:50:21]  No supervise_pilot CHILD process found
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 36: /dev/null: Stale file handle
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 15: date: command not found
[2026-01-29 00:50:21]  Test setup, not cleaning
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 15: date: command not found
[2026-01-29 00:50:21]  apfmon messages muted
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 15: date: command not found
[2026-01-29 00:50:21]  ==== wrapper stdout END ====
[2026-01-29 00:50:21] ./runpilot2-wrapper.sh: line 10: date: command not found
[2026-01-29 00:50:21]  ==== wrapper stderr END ====
[2026-01-29 00:50:21]  *** Error codes and diagnostics ***
[2026-01-29 00:50:21]  *** Listing of results directory ***
[2026-01-29 00:50:21] total 519152
[2026-01-29 00:50:21] -rw-r--r-- 1 ek ek    584422 Jan 28 14:23 pilot3.tar.gz
[2026-01-29 00:50:21] -rwx------ 1 ek ek     36322 Jan 28 14:24 runpilot2-wrapper.sh
[2026-01-29 00:50:21] -rw-r--r-- 1 ek ek      5111 Jan 28 14:28 queuedata.json
[2026-01-29 00:50:21] -rw-rw-r-- 1 ek ek       100 Jan 28 23:52 wrapper_26015_x86_64-pc-linux-gnu
[2026-01-29 00:50:21] -rwxr-xr-x 1 ek ek      7986 Jan 28 23:52 run_atlas
[2026-01-29 00:50:21] -rw-rw-r-- 1 ek ek       105 Jan 28 23:52 job.xml
[2026-01-29 00:50:21] -rw-r--r-- 2 ek ek 526689365 Jan 28 23:52 EVNT.48317909._000092.pool.root.1
[2026-01-29 00:50:21] drwxrwx--x 2 ek ek      4096 Jan 28 23:52 shared
[2026-01-29 00:50:21] -rw-r--r-- 2 ek ek    597758 Jan 28 23:52 input.tar.gz
[2026-01-29 00:50:21] -rw-r--r-- 2 ek ek     15845 Jan 28 23:52 start_atlas.sh
[2026-01-29 00:50:21] -rw-rw-r-- 1 ek ek         0 Jan 28 23:52 boinc_setup_complete
[2026-01-29 00:50:21] -rw-rw-r-- 1 ek ek      9841 Jan 28 23:52 init_data.xml
[2026-01-29 00:50:21] -rw-rw-r-- 1 ek ek         0 Jan 28 23:52 boinc_lockfile
[2026-01-29 00:50:21] -rw-rw-r-- 1 ek ek      2501 Jan 28 23:52 pandaJob.out
[2026-01-29 00:50:21] -rw------- 1 ek ek        50 Jan 28 23:52 setup.sh.local
[2026-01-29 00:50:21] -rw------- 1 ek ek   1005755 Jan 28 23:52 agis_schedconf.cvmfs.json
[2026-01-29 00:50:21] -rw------- 1 ek ek   1516246 Jan 28 23:52 agis_ddmendpoints.agis.ALL.json
[2026-01-29 00:50:21] -rw------- 1 ek ek       432 Jan 28 23:52 workernode_map.json
[2026-01-29 00:50:21] drwx------ 5 ek ek      4096 Jan 28 23:52 pilot3
[2026-01-29 00:50:21] -rw------- 1 ek ek       913 Jan 29 00:22 heartbeat.json
[2026-01-29 00:50:21] -rw------- 1 ek ek        95 Jan 29 00:47 pilot_heartbeat.json
[2026-01-29 00:50:21] drwxrwx--- 2 ek ek      4096 Jan 29 00:48 PanDA_Pilot-6985339801
[2026-01-29 00:50:21] -rw-rw-r-- 1 ek ek       530 Jan 29 00:49 boinc_task_state.xml
[2026-01-29 00:50:21] -rw-rw-r-- 1 ek ek      8192 Jan 29 00:49 boinc_mmap_file
[2026-01-29 00:50:21] -rw-rw-r-- 1 ek ek        27 Jan 29 00:49 wrapper_checkpoint.txt
[2026-01-29 00:50:21] -rw------- 1 ek ek    492044 Jan 29 00:50 pilotlog.txt
[2026-01-29 00:50:21] -rw------- 1 ek ek       544 Jan 29 00:50 M01LDmslJ58nsSi4ap6QjLDmwznN0nGgGQJmIVkRDmvpFKDmRCOdNo.diag
[2026-01-29 00:50:21] -rw------- 1 ek ek    509647 Jan 29 00:50 log.48317911._004616.job.log.1
[2026-01-29 00:50:21] -rw-rw-r-- 1 ek ek       428 Jan 29 00:50 runtime_log
[2026-01-29 00:50:21] -rw-rw-r-- 1 ek ek     10142 Jan 29 00:50 runtime_log.err
[2026-01-29 00:50:21] -rw-rw-r-- 1 ek ek     27061 Jan 29 00:50 stderr.txt
[2026-01-29 00:50:21] No HITS result produced
[2026-01-29 00:50:21]  *** Contents of shared directory: ***
[2026-01-29 00:50:21] total 514952
[2026-01-29 00:50:21] -rw-r--r-- 2 ek ek 526689365 Jan 28 23:52 ATLAS.root_0
[2026-01-29 00:50:21] -rw-r--r-- 2 ek ek    597758 Jan 28 23:52 input.tar.gz
[2026-01-29 00:50:21] -rw-r--r-- 2 ek ek     15845 Jan 28 23:52 start_atlas.sh
00:50:22 (4002380): run_atlas exited; CPU time 49.113373
00:50:22 (4002380): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>M01LDmslJ58nsSi4ap6QjLDmwznN0nGgGQJmIVkRDmvpFKDmRCOdNo_0_r1248451865_ATLAS_result</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
</message>
]]>


©2026 CERN