Name | GzvMDmyOy56nsSi4ap6QjLDmwznN0nGgGQJmkKkKDmTctKDmn3sbZm_0 |
Workunit | 230507733 |
Created | 22 Feb 2025, 6:42:57 UTC |
Sent | 22 Feb 2025, 12:01:27 UTC |
Report deadline | 2 Mar 2025, 12:01:27 UTC |
Received | 24 Feb 2025, 23:45:26 UTC |
Server state | Over |
Outcome | Success |
Client state | Done |
Exit status | 0 (0x00000000) |
Computer ID | 10687519 |
Run time | 2 days 11 hours 35 min 54 sec |
CPU time | 3 days 22 hours 17 min 38 sec |
Validate state | Valid |
Credit | 7,638.07 |
Device peak FLOPS | 31.54 GFLOPS |
Application version | ATLAS Simulation v3.01 (native_mt) x86_64-pc-linux-gnu |
Peak working set size | 2.46 GB |
Peak swap size | 31.84 GB |
Peak disk usage | 1.31 GB |
<core_client_version>7.7.0</core_client_version> <![CDATA[ <stderr_txt> 07:02:17 (8053): wrapper (7.7.26015): starting 07:02:17 (8053): wrapper: running run_atlas (--nthreads 10) [2025-02-22 07:02:18] Arguments: --nthreads 10 [2025-02-22 07:02:18] Threads: 10 [2025-02-22 07:02:18] Checking for CVMFS [2025-02-22 07:02:18] Probing /cvmfs/atlas.cern.ch... OK [2025-02-22 07:02:19] Probing /cvmfs/atlas-condb.cern.ch... OK [2025-02-22 07:02:19] Running cvmfs_config stat atlas.cern.ch [2025-02-22 07:02:20] VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE [2025-02-22 07:02:20] 2.11.2.0 26780 10978 329348 142938 1 349 16697880 18432000 15455 130560 0 36521296 98.846 103308643 38438 http://cvmfs-s1fnal.opensciencegrid.org:8000/cvmfs/atlas.cern.ch http://192.41.237.109:6081 1 [2025-02-22 07:02:20] CVMFS is ok [2025-02-22 07:02:20] Efficiency of ATLAS tasks can be improved by the following measure(s): [2025-02-22 07:02:20] The CVMFS client on this computer should be configured to use Cloudflare's openhtc.io. [2025-02-22 07:02:20] Further information can be found at the LHC@home message board. [2025-02-22 07:02:20] Using apptainer image /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 [2025-02-22 07:02:20] Checking for apptainer binary... [2025-02-22 07:02:20] Using apptainer found in PATH at /usr/bin/apptainer [2025-02-22 07:02:20] Running /usr/bin/apptainer --version [2025-02-22 07:02:20] apptainer version 1.3.2-1.el7 [2025-02-22 07:02:20] Checking apptainer works with /usr/bin/apptainer exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 hostname [2025-02-22 07:02:38] c-211-22.aglt2.org [2025-02-22 07:02:38] apptainer works [2025-02-22 07:02:38] Set ATHENA_PROC_NUMBER=10 [2025-02-22 07:02:38] Set ATHENA_CORE_NUMBER=10 [2025-02-22 07:02:38] Starting ATLAS job with PandaID=6525230404 [2025-02-22 07:02:38] Running command: /usr/bin/apptainer exec -B /cvmfs,/tmp/boinchome/slots/0 /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 sh start_atlas.sh 18:40:04 (8053): BOINC client no longer exists - exiting 18:40:04 (8053): timer handler: client dead, exiting 18:46:58 (36366): wrapper (7.7.26015): starting 18:46:58 (36366): wrapper: running run_atlas (--nthreads 10) [2025-02-22 18:46:58] Arguments: --nthreads 10 [2025-02-22 18:46:58] Threads: 10 [2025-02-22 18:46:58] This job has been restarted, cleaning up previous attempt [2025-02-22 18:46:58] Checking for CVMFS [2025-02-22 18:46:58] Probing /cvmfs/atlas.cern.ch... OK [2025-02-22 18:46:58] Probing /cvmfs/atlas-condb.cern.ch... OK [2025-02-22 18:46:58] Running cvmfs_config stat atlas.cern.ch [2025-02-22 18:46:59] VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE [2025-02-22 18:46:59] 2.11.2.0 26780 11683 333860 142956 1 316 17042722 18432001 25403 130560 0 38937808 98.851 108959562 38251 http://cvmfs-s1fnal.opensciencegrid.org:8000/cvmfs/atlas.cern.ch http://192.41.237.109:6081 1 [2025-02-22 18:46:59] CVMFS is ok [2025-02-22 18:46:59] Efficiency of ATLAS tasks can be improved by the following measure(s): [2025-02-22 18:46:59] The CVMFS client on this computer should be configured to use Cloudflare's openhtc.io. [2025-02-22 18:46:59] Further information can be found at the LHC@home message board. [2025-02-22 18:46:59] Using apptainer image /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 [2025-02-22 18:46:59] Checking for apptainer binary... [2025-02-22 18:46:59] Using apptainer found in PATH at /usr/bin/apptainer [2025-02-22 18:46:59] Running /usr/bin/apptainer --version [2025-02-22 18:46:59] apptainer version 1.3.2-1.el7 [2025-02-22 18:46:59] Checking apptainer works with /usr/bin/apptainer exec -B /cvmfs /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 hostname [2025-02-22 18:47:03] c-211-22.aglt2.org [2025-02-22 18:47:03] apptainer works [2025-02-22 18:47:03] Set ATHENA_PROC_NUMBER=10 [2025-02-22 18:47:03] Set ATHENA_CORE_NUMBER=10 [2025-02-22 18:47:04] Starting ATLAS job with PandaID=6525230404 [2025-02-22 18:47:04] Running command: /usr/bin/apptainer exec -B /cvmfs,/tmp/boinchome/slots/0 /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 sh start_atlas.sh [2025-02-24 18:45:04] *** The last 200 lines of the pilot log: *** [2025-02-24 18:45:04] 2025-02-24 23:41:54,368 | INFO | executing command: lscpu [2025-02-24 18:45:04] 2025-02-24 23:41:55,108 | INFO | found 20 cores (10 cores per socket, 2 sockets) [2025-02-24 18:45:04] 2025-02-24 23:41:55,368 | INFO | executing command: export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase;source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh --quiet;lsetup [2025-02-24 18:45:04] 2025-02-24 23:41:56,799 | INFO | PID=4615 has CPU usage=4.8% CMD=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/python/3.9.20-x86_64-centos7/bin/python3 pilot3/pilot.py -q BOINC_MCORE -i PR - [2025-02-24 18:45:04] 2025-02-24 23:41:56,799 | INFO | .. there are 41 such processes running [2025-02-24 18:45:04] 2025-02-24 23:42:24,369 | INFO | PID=4615 has CPU usage=3.5% CMD=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/python/3.9.20-x86_64-centos7/bin/python3 pilot3/pilot.py -q BOINC_MCORE -i PR - [2025-02-24 18:45:04] 2025-02-24 23:42:24,370 | INFO | .. there are 41 such processes running [2025-02-24 18:45:04] 2025-02-24 23:42:52,115 | INFO | PID=4615 has CPU usage=2.9% CMD=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/python/3.9.20-x86_64-centos7/bin/python3 pilot3/pilot.py -q BOINC_MCORE -i PR - [2025-02-24 18:45:04] 2025-02-24 23:42:52,115 | INFO | .. there are 41 such processes running [2025-02-24 18:45:04] 2025-02-24 23:43:15,585 | INFO | CPU arch script returned: x86-64-v2 [2025-02-24 18:45:04] 2025-02-24 23:43:15,585 | INFO | using path: /tmp/boinchome/slots/0/PanDA_Pilot-6525230404/memory_monitor_output.txt (trf name=prmon) [2025-02-24 18:45:04] 2025-02-24 23:43:15,869 | INFO | extracted standard info from memory monitor json [2025-02-24 18:45:04] 2025-02-24 23:43:15,869 | WARNING | standard memory fields were not found in memory monitor json (or json doesn't exist yet): 'totRCHAR' [2025-02-24 18:45:04] 2025-02-24 23:43:16,612 | INFO | fitting pss+swap vs Time [2025-02-24 18:45:04] 2025-02-24 23:43:16,648 | INFO | model: linear, x: [1740268129.0, 1740268190.0, 1740268251.0, 1740268312.0, 1740268373.0, 1740268434.0, 1740268495.0, 1740268556.0, 1740268617.0, 1740268678.0, 1740 [2025-02-24 18:45:04] 2025-02-24 23:43:16,650 | INFO | sum of square deviations: 7013243834509.861 [2025-02-24 18:45:04] 2025-02-24 23:43:21,616 | INFO | sum of deviations: 6307229708808.4795 [2025-02-24 18:45:04] 2025-02-24 23:43:21,617 | INFO | mean x: 1740354353.5201557 [2025-02-24 18:45:04] 2025-02-24 23:43:21,640 | INFO | mean y: 2650861.5265205093 [2025-02-24 18:45:04] 2025-02-24 23:43:21,640 | INFO | -- intersect: -1562504285.135173 [2025-02-24 18:45:04] 2025-02-24 23:43:21,640 | INFO | intersect: -1562504285.135173 [2025-02-24 18:45:04] 2025-02-24 23:43:21,645 | INFO | chi2: 7.806351765752151 [2025-02-24 18:45:04] 2025-02-24 23:43:21,648 | INFO | model: linear, x: [1740268129.0, 1740268190.0, 1740268251.0, 1740268312.0, 1740268373.0, 1740268434.0, 1740268495.0, 1740268556.0, 1740268617.0, 1740268678.0, 1740 [2025-02-24 18:45:04] 2025-02-24 23:43:21,649 | INFO | sum of square deviations: 6976110648754.888 [2025-02-24 18:45:04] 2025-02-24 23:43:23,478 | INFO | PID=4615 has CPU usage=3.0% CMD=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/python/3.9.20-x86_64-centos7/bin/python3 pilot3/pilot.py -q BOINC_MCORE -i PR - [2025-02-24 18:45:04] 2025-02-24 23:43:23,479 | INFO | .. there are 41 such processes running [2025-02-24 18:45:04] 2025-02-24 23:43:23,529 | INFO | sum of deviations: 6290969477871.12 [2025-02-24 18:45:04] 2025-02-24 23:43:23,530 | INFO | mean x: 1740354201.0198371 [2025-02-24 18:45:04] 2025-02-24 23:43:23,530 | INFO | mean y: 2650794.7481402764 [2025-02-24 18:45:04] 2025-02-24 23:43:23,530 | INFO | -- intersect: -1566778893.3051436 [2025-02-24 18:45:04] 2025-02-24 23:43:23,530 | INFO | intersect: -1566778893.3051436 [2025-02-24 18:45:04] 2025-02-24 23:43:23,534 | INFO | chi2: 7.805938804263543 [2025-02-24 18:45:04] 2025-02-24 23:43:23,535 | INFO | current chi2=7.805938804263543 (change=0.005290070201799444 %) [2025-02-24 18:45:04] 2025-02-24 23:43:23,535 | INFO | right removable region: 2822 [2025-02-24 18:45:04] 2025-02-24 23:43:23,537 | INFO | model: linear, x: [1740268434.0, 1740268495.0, 1740268556.0, 1740268617.0, 1740268678.0, 1740268739.0, 1740268800.0, 1740268861.0, 1740268922.0, 1740268983.0, 1740 [2025-02-24 18:45:04] 2025-02-24 23:43:23,539 | INFO | sum of square deviations: 6976109923464.638 [2025-02-24 18:45:04] 2025-02-24 23:43:24,272 | INFO | sum of deviations: 5425315101617.163 [2025-02-24 18:45:04] 2025-02-24 23:43:24,272 | INFO | mean x: 1740354506.0219624 [2025-02-24 18:45:04] 2025-02-24 23:43:24,272 | INFO | mean y: 2654482.787460149 [2025-02-24 18:45:04] 2025-02-24 23:43:24,272 | INFO | -- intersect: -1350817823.021497 [2025-02-24 18:45:05] 2025-02-24 23:43:24,273 | INFO | intersect: -1350817823.021497 [2025-02-24 18:45:05] 2025-02-24 23:43:24,277 | INFO | chi2: 4.722842534653857 [2025-02-24 18:45:05] 2025-02-24 23:43:24,277 | INFO | current chi2=4.722842534653857 (change=39.50000363327458 %) [2025-02-24 18:45:05] 2025-02-24 23:43:24,280 | INFO | model: linear, x: [1740268739.0, 1740268800.0, 1740268861.0, 1740268922.0, 1740268983.0, 1740269044.0, 1740269105.0, 1740269166.0, 1740269227.0, 1740269288.0, 1740 [2025-02-24 18:45:05] 2025-02-24 23:43:24,282 | INFO | sum of square deviations: 6939107317206.892 [2025-02-24 18:45:05] 2025-02-24 23:43:24,495 | INFO | 172555s have passed since pilot start [2025-02-24 18:45:05] 2025-02-24 23:43:24,775 | INFO | sum of deviations: 4747637469592.111 [2025-02-24 18:45:05] 2025-02-24 23:43:24,776 | INFO | mean x: 1740354658.5237758 [2025-02-24 18:45:05] 2025-02-24 23:43:24,776 | INFO | mean y: 2657275.677430802 [2025-02-24 18:45:05] 2025-02-24 23:43:24,776 | INFO | -- intersect: -1188068362.2872643 [2025-02-24 18:45:05] 2025-02-24 23:43:24,776 | INFO | intersect: -1188068362.2872643 [2025-02-24 18:45:05] 2025-02-24 23:43:24,781 | INFO | chi2: 2.9916409532296 [2025-02-24 18:45:05] 2025-02-24 23:43:24,781 | INFO | current chi2=2.9916409532296 (change=36.65592423887872 %) [2025-02-24 18:45:05] 2025-02-24 23:43:24,784 | INFO | model: linear, x: [1740269044.0, 1740269105.0, 1740269166.0, 1740269227.0, 1740269288.0, 1740269349.0, 1740269410.0, 1740269471.0, 1740269532.0, 1740269593.0, 1740 [2025-02-24 18:45:05] 2025-02-24 23:43:24,785 | INFO | sum of square deviations: 6902235783174.155 [2025-02-24 18:45:05] 2025-02-24 23:43:27,634 | INFO | sum of deviations: 4105435483340.5645 [2025-02-24 18:45:05] 2025-02-24 23:43:27,634 | INFO | mean x: 1740354811.0255954 [2025-02-24 18:45:05] 2025-02-24 23:43:27,635 | INFO | mean y: 2659931.756132243 [2025-02-24 18:45:05] 2025-02-24 23:43:27,635 | INFO | -- intersect: -1032499488.9354026 [2025-02-24 18:45:05] 2025-02-24 23:43:27,635 | INFO | intersect: -1032499488.9354026 [2025-02-24 18:45:05] 2025-02-24 23:43:27,646 | INFO | chi2: 1.4431459619723896 [2025-02-24 18:45:05] 2025-02-24 23:43:27,647 | INFO | current chi2=1.4431459619723896 (change=51.76072314378322 %) [2025-02-24 18:45:05] 2025-02-24 23:43:27,655 | INFO | model: linear, x: [1740269349.0, 1740269410.0, 1740269471.0, 1740269532.0, 1740269593.0, 1740269654.0, 1740269715.0, 1740269776.0, 1740269837.0, 1740269898.0, 1740 [2025-02-24 18:45:05] 2025-02-24 23:43:27,662 | INFO | sum of square deviations: 6865495088803.884 [2025-02-24 18:45:05] 2025-02-24 23:43:28,935 | INFO | sum of deviations: 3851777622365.9775 [2025-02-24 18:45:05] 2025-02-24 23:43:28,936 | INFO | mean x: 1740354963.5274217 [2025-02-24 18:45:05] 2025-02-24 23:43:28,936 | INFO | mean y: 2660984.344373219 [2025-02-24 18:45:05] 2025-02-24 23:43:28,936 | INFO | -- intersect: -973737690.0091126 [2025-02-24 18:45:05] 2025-02-24 23:43:28,936 | INFO | intersect: -973737690.0091126 [2025-02-24 18:45:05] 2025-02-24 23:43:28,944 | INFO | chi2: 1.2055081331925313 [2025-02-24 18:45:05] 2025-02-24 23:43:28,944 | INFO | current chi2=1.2055081331925313 (change=16.46665237209075 %) [2025-02-24 18:45:05] 2025-02-24 23:43:28,949 | INFO | left removable region: 40 [2025-02-24 18:45:05] 2025-02-24 23:43:28,952 | INFO | model: linear, x: [1740270569.0, 1740270630.0, 1740270691.0, 1740270752.0, 1740270813.0, 1740270874.0, 1740270935.0, 1740270996.0, 1740271057.0, 1740271118.0, 1740 [2025-02-24 18:45:05] 2025-02-24 23:43:28,988 | INFO | sum of square deviations: 6676544878702.307 [2025-02-24 18:45:05] 2025-02-24 23:43:30,871 | INFO | sum of deviations: 2961673925408.428 [2025-02-24 18:45:05] 2025-02-24 23:43:30,871 | INFO | mean x: 1740355390.5337887 [2025-02-24 18:45:05] 2025-02-24 23:43:30,871 | INFO | mean y: 2664610.660316319 [2025-02-24 18:45:05] 2025-02-24 23:43:30,872 | INFO | -- intersect: -769346253.4514452 [2025-02-24 18:45:05] 2025-02-24 23:43:30,872 | INFO | intersect: -769346253.4514452 [2025-02-24 18:45:05] 2025-02-24 23:43:30,876 | INFO | chi2: 0.5402423203205676 [2025-02-24 18:45:05] 2025-02-24 23:43:30,876 | INFO | -- intersect: -769346253.4514452 [2025-02-24 18:45:05] 2025-02-24 23:43:30,877 | INFO | current memory leak: 0.44 B/s (using 2782 data points, chi2=0.54) [2025-02-24 18:45:05] 2025-02-24 23:43:33,388 | INFO | monitor loop #7076: job 0:6525230404 is in state 'running' [2025-02-24 18:45:05] 2025-02-24 23:43:44,939 | INFO | system is under heavy CPU load [2025-02-24 18:45:05] 2025-02-24 23:43:44,939 | INFO | CPU consumption time changed by a factor of 1.0009748601339832 (below the limit of 10) [2025-02-24 18:45:05] 2025-02-24 23:43:44,940 | INFO | (instant) CPU consumption time for pid=44115: 208438) [2025-02-24 18:45:05] 2025-02-24 23:43:44,940 | INFO | using path: /tmp/boinchome/slots/0/PanDA_Pilot-6525230404/memory_monitor_output.txt (trf name=prmon) [2025-02-24 18:45:05] 2025-02-24 23:43:45,695 | INFO | using path: /tmp/boinchome/slots/0/PanDA_Pilot-6525230404/memory_monitor_output.txt (trf name=prmon) [2025-02-24 18:45:05] 2025-02-24 23:43:45,957 | INFO | max memory (maxPSS) used by the payload is within the allowed limit: 2688925 B (2 * maxRSS = 81920000 B, memkillgrace = 100%) [2025-02-24 18:45:05] 2025-02-24 23:43:45,958 | INFO | reaping zombies for max 20 seconds [2025-02-24 18:45:05] 2025-02-24 23:43:46,003 | INFO | checking for looping job (in state=running) [2025-02-24 18:45:05] 2025-02-24 23:43:46,004 | INFO | using looping job limit: 7200 s [2025-02-24 18:45:05] 2025-02-24 23:43:46,004 | INFO | executing command: find /tmp/boinchome/slots/0/PanDA_Pilot-6525230404 -mmin -120 [2025-02-24 18:45:05] 2025-02-24 23:43:46,310 | INFO | found 3 files that were recently updated [2025-02-24 18:45:05] 2025-02-24 23:43:46,311 | INFO | file /tmp/boinchome/slots/0/PanDA_Pilot-6525230404/log.EVNTtoHITS is the most recently updated file (at time=1740440108) [2025-02-24 18:45:05] 2025-02-24 23:43:46,330 | INFO | files were last touched 0h 8m 38s ago (current time: 1740440626) [2025-02-24 18:45:05] 2025-02-24 23:43:46,351 | INFO | payload log (log.EVNTtoHITS) within allowed size limit (2147483648 B): 4038712 B [2025-02-24 18:45:05] 2025-02-24 23:43:46,351 | INFO | payload log (payload.stdout) within allowed size limit (2147483648 B): 9339 B [2025-02-24 18:45:05] 2025-02-24 23:43:46,391 | INFO | executing command: df -mP /tmp/boinchome/slots/0 [2025-02-24 18:45:05] 2025-02-24 23:43:46,471 | INFO | sufficient remaining disk space (57358155776 B) [2025-02-24 18:45:05] 2025-02-24 23:43:46,507 | INFO | work directory size check will use 2537553920 B as a max limit (10% grace limit added) [2025-02-24 18:45:05] 2025-02-24 23:43:46,514 | INFO | size of work directory /tmp/boinchome/slots/0/PanDA_Pilot-6525230404: 283795562 B (within 2537553920 B limit) [2025-02-24 18:45:05] 2025-02-24 23:43:46,526 | INFO | total size of present files: 278311977 B (workdir size: 283795562 B) [2025-02-24 18:45:05] 2025-02-24 23:43:46,526 | INFO | output file /tmp/boinchome/slots/0/PanDA_Pilot-6525230404/HITS.43092792._002813.pool.root.1 is within allowed size limit (278311977 B < 536870912000 B) [2025-02-24 18:45:05] 2025-02-24 23:43:50,508 | INFO | number of running child processes to parent process 44115: 6 [2025-02-24 18:45:05] 2025-02-24 23:43:50,509 | INFO | maximum number of monitored processes: 6 [2025-02-24 18:45:05] 2025-02-24 23:43:50,842 | INFO | PID=4615 has CPU usage=9.7% CMD=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/python/3.9.20-x86_64-centos7/bin/python3 pilot3/pilot.py -q BOINC_MCORE -i PR - [2025-02-24 18:45:05] 2025-02-24 23:43:50,842 | INFO | .. there are 41 such processes running [2025-02-24 18:45:05] 2025-02-24 23:43:51,051 | INFO | running: iteration=2830 pid=44115 exit_code=None [2025-02-24 18:45:05] 2025-02-24 23:43:53,032 | INFO | monitor loop #7077: job 0:6525230404 is in state 'running' [2025-02-24 18:45:05] 2025-02-24 23:44:05,281 | INFO | system is under heavy CPU load [2025-02-24 18:45:05] 2025-02-24 23:44:05,281 | INFO | CPU consumption time changed by a factor of 1.0002110939464013 (below the limit of 10) [2025-02-24 18:45:05] 2025-02-24 23:44:05,281 | INFO | (instant) CPU consumption time for pid=44115: 208482) [2025-02-24 18:45:05] 2025-02-24 23:44:05,282 | INFO | using path: /tmp/boinchome/slots/0/PanDA_Pilot-6525230404/memory_monitor_output.txt (trf name=prmon) [2025-02-24 18:45:05] 2025-02-24 23:44:07,727 | INFO | number of running child processes to parent process 44115: 6 [2025-02-24 18:45:05] 2025-02-24 23:44:07,727 | INFO | maximum number of monitored processes: 6 [2025-02-24 18:45:05] 2025-02-24 23:44:10,236 | INFO | monitor loop #7078: job 0:6525230404 is in state 'running' [2025-02-24 18:45:05] 2025-02-24 23:44:16,283 | INFO | PID=4615 has CPU usage=5.0% CMD=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/python/3.9.20-x86_64-centos7/bin/python3 pilot3/pilot.py -q BOINC_MCORE -i PR - [2025-02-24 18:45:05] 2025-02-24 23:44:16,284 | INFO | .. there are 41 such processes running [2025-02-24 18:45:05] 2025-02-24 23:44:23,449 | INFO | system is under heavy CPU load [2025-02-24 18:45:05] 2025-02-24 23:44:23,449 | INFO | CPU consumption time changed by a factor of 1.0001486938920385 (below the limit of 10) [2025-02-24 18:45:05] 2025-02-24 23:44:23,509 | INFO | (instant) CPU consumption time for pid=44115: 208513) [2025-02-24 18:45:05] 2025-02-24 23:44:23,536 | INFO | using path: /tmp/boinchome/slots/0/PanDA_Pilot-6525230404/memory_monitor_output.txt (trf name=prmon) [2025-02-24 18:45:05] 2025-02-24 23:44:26,070 | INFO | number of running child processes to parent process 44115: 6 [2025-02-24 18:45:05] 2025-02-24 23:44:26,071 | INFO | maximum number of monitored processes: 6 [2025-02-24 18:45:05] 2025-02-24 23:44:28,595 | INFO | monitor loop #7079: job 0:6525230404 is in state 'running' [2025-02-24 18:45:05] 2025-02-24 23:44:31,459 | CRITICAL | max running time (172800s) minus grace time (180s) has been exceeded - time to abort pilot [2025-02-24 18:45:05] 2025-02-24 23:44:31,459 | INFO | setting REACHED_MAXTIME and graceful stop [2025-02-24 18:45:05] 2025-02-24 23:44:31,459 | INFO | [monitor] control thread has ended [2025-02-24 18:45:05] 2025-02-24 23:44:31,467 | WARNING | since job:queue_monitor is responsible for sending job updates, we sleep for 20 s [2025-02-24 18:45:05] 2025-02-24 23:44:31,587 | INFO | breaking -- sending SIGTERM to pid=44115 [2025-02-24 18:45:05] 2025-02-24 23:44:31,587 | INFO | breaking -- sleep 10 s before sending SIGKILL pid=44115 [2025-02-24 18:45:05] 2025-02-24 23:44:32,475 | INFO | all data control threads have been joined [2025-02-24 18:45:05] 2025-02-24 23:44:32,583 | INFO | [data] copytool_in thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:32,728 | INFO | all payload control threads have been joined [2025-02-24 18:45:05] 2025-02-24 23:44:32,785 | INFO | [payload] failed_post thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:32,989 | INFO | all job control threads have been joined [2025-02-24 18:45:05] 2025-02-24 23:44:33,481 | INFO | [data] control thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:33,496 | INFO | [job] create_data_payload thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:33,546 | INFO | [job] validate thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:33,734 | INFO | [payload] control thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:33,936 | INFO | [payload] validate_pre thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:33,949 | INFO | [job] retrieve thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:33,995 | INFO | [job] control thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:34,005 | INFO | [payload] validate_post thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:34,492 | INFO | [data] copytool_out thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:36,344 | INFO | [data] queue_monitor thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:37,550 | INFO | job.realtimelogging is not enabled [2025-02-24 18:45:05] 2025-02-24 23:44:38,555 | INFO | [payload] run_realtimelog thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:39,644 | INFO | system is under heavy CPU load [2025-02-24 18:45:05] 2025-02-24 23:44:39,644 | INFO | CPU consumption time changed by a factor of 2.8775184281076e-05 (below the limit of 10) [2025-02-24 18:45:05] 2025-02-24 23:44:39,645 | INFO | (instant) CPU consumption time for pid=44115: 6) [2025-02-24 18:45:05] 2025-02-24 23:44:39,645 | INFO | using path: /tmp/boinchome/slots/0/PanDA_Pilot-6525230404/memory_monitor_output.txt (trf name=prmon) [2025-02-24 18:45:05] 2025-02-24 23:44:40,902 | INFO | number of running child processes to parent process 44115: 1 [2025-02-24 18:45:05] 2025-02-24 23:44:40,903 | INFO | maximum number of monitored processes: 6 [2025-02-24 18:45:05] 2025-02-24 23:44:40,903 | INFO | will abort loop [2025-02-24 18:45:05] 2025-02-24 23:44:41,638 | INFO | [2025-02-24 18:45:05] [2025-02-24 18:45:05] finished pid=44115 exit_code=None state=failed [2025-02-24 18:45:05] [2025-02-24 18:45:05] 2025-02-24 23:44:41,638 | WARNING | detected unset exit_code from wait_graceful - reset to -1 [2025-02-24 18:45:05] 2025-02-24 23:44:41,640 | INFO | using pid=17313 to kill prmon [2025-02-24 18:45:05] 2025-02-24 23:44:41,641 | INFO | stopping utility process 'MemoryMonitor' with signal 10 [2025-02-24 18:45:05] 2025-02-24 23:44:41,641 | INFO | process 17313 no longer exists [2025-02-24 18:45:05] 2025-02-24 23:44:41,641 | INFO | utility process 44127 cleanup finished with status=True [2025-02-24 18:45:05] 2025-02-24 23:44:41,641 | INFO | taking a short nap (3 s) to allow the memory monitor to finish writing to the summary file (#0/#20) [2025-02-24 18:45:05] 2025-02-24 23:44:41,910 | INFO | [job] job monitor thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:44,657 | INFO | copied /tmp/boinchome/slots/0/PanDA_Pilot-6525230404/memory_monitor_summary.json to /tmp/boinchome/slots/0 [2025-02-24 18:45:05] 2025-02-24 23:44:44,786 | INFO | found no lingering processes [2025-02-24 18:45:05] 2025-02-24 23:44:44,786 | INFO | CPU consumption time: 5511.33 s (rounded to 5511 s) [2025-02-24 18:45:05] 2025-02-24 23:44:44,786 | WARNING | main payload execution returned non-zero exit code: -1 [2025-02-24 18:45:05] 2025-02-24 23:44:44,787 | INFO | scanning dmesg message for subprocess=4669 for memory errors [2025-02-24 18:45:05] 2025-02-24 23:44:44,787 | INFO | executing command: dmesg|grep 4669 [2025-02-24 18:45:05] 2025-02-24 23:44:45,206 | WARNING | job report does not exist: /tmp/boinchome/slots/0/PanDA_Pilot-6525230404/jobReport.json [2025-02-24 18:45:05] 2025-02-24 23:44:45,206 | WARNING | metadata does not exist: /tmp/boinchome/slots/0/PanDA_Pilot-6525230404/metadata.xml [2025-02-24 18:45:05] 2025-02-24 23:44:45,206 | WARNING | file does not exist: /tmp/boinchome/slots/0/PanDA_Pilot-6525230404/metadata.xml [2025-02-24 18:45:05] 2025-02-24 23:44:45,207 | INFO | generated guid for lfn=HITS.43092792._002813.pool.root.1: 8EA7F4B2-B1A3-4330-87F0-E21993C4C1AE [2025-02-24 18:45:05] 2025-02-24 23:44:45,207 | WARNING | aborting payload error diagnosis since an error has already been set: [1315, 1187] [2025-02-24 18:45:05] 2025-02-24 23:44:46,410 | INFO | [payload] execute_payloads thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:47,610 | INFO | waiting for thread to finish: ['<_MainThread(MainThread, started 140194755405632)>', '<ExcThread(queue_monitor, started 140194082703104)>'] [2025-02-24 18:45:05] 2025-02-24 23:44:49,622 | INFO | waiting for thread to finish: ['<_MainThread(MainThread, started 140194755405632)>', '<ExcThread(queue_monitor, started 140194082703104)>'] [2025-02-24 18:45:05] 2025-02-24 23:44:51,635 | INFO | waiting for thread to finish: ['<_MainThread(MainThread, started 140194755405632)>', '<ExcThread(queue_monitor, started 140194082703104)>'] [2025-02-24 18:45:05] 2025-02-24 23:44:53,647 | INFO | waiting for thread to finish: ['<_MainThread(MainThread, started 140194755405632)>', '<ExcThread(queue_monitor, started 140194082703104)>'] [2025-02-24 18:45:05] 2025-02-24 23:44:54,576 | INFO | waiting for thread to finish: ['<_MainThread(MainThread, started 140194755405632)>', '<ExcThread(queue_monitor, started 140194082703104)>'] [2025-02-24 18:45:05] 2025-02-24 23:44:54,576 | INFO | [job] queue monitor thread has finished [2025-02-24 18:45:05] 2025-02-24 23:44:55,660 | INFO | caller=run is remaining thread - safe to abort (names=['<_MainThread(MainThread, started 140194755405632)>']) [2025-02-24 18:45:05] 2025-02-24 23:45:00,686 | INFO | all workflow threads have been joined [2025-02-24 18:45:05] 2025-02-24 23:45:00,690 | INFO | end of generic workflow (traces error code: 0) [2025-02-24 18:45:05] 2025-02-24 23:45:00,690 | INFO | traces error code: 0 [2025-02-24 18:45:05] 2025-02-24 23:45:00,690 | INFO | pilot has finished (exit code=0, shell exit code=0) [2025-02-24 18:45:05] 2025-02-24 23:45:01,560 [wrapper] ==== pilot stdout END ==== [2025-02-24 18:45:05] 2025-02-24 23:45:01,610 [wrapper] ==== wrapper stdout RESUME ==== [2025-02-24 18:45:05] 2025-02-24 23:45:01,676 [wrapper] pilotpid: 4615 [2025-02-24 18:45:05] 2025-02-24 23:45:01,716 [wrapper] Pilot exit status: 0 [2025-02-24 18:45:05] 2025-02-24 23:45:01,917 [wrapper] pandaids: 6525230404 6525230404 [2025-02-24 18:45:05] 2025-02-24 23:45:02,657 [wrapper] cleanup supervisor_pilot 9726 4616 [2025-02-24 18:45:05] 2025-02-24 23:45:02,694 [wrapper] Test setup, not cleaning [2025-02-24 18:45:05] 2025-02-24 23:45:02,767 [wrapper] apfmon messages muted [2025-02-24 18:45:05] 2025-02-24 23:45:02,795 [wrapper] ==== wrapper stdout END ==== [2025-02-24 18:45:05] 2025-02-24 23:45:02,831 [wrapper] ==== wrapper stderr END ==== [2025-02-24 18:45:05] *** Error codes and diagnostics *** [2025-02-24 18:45:05] *** Listing of results directory *** [2025-02-24 18:45:05] total 783256 [2025-02-24 18:45:05] drwx------ 4 boincer umatlas 4096 Dec 18 06:03 pilot3 [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 491065 Feb 22 01:14 pilot3.tar.gz [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 5118 Feb 22 01:41 queuedata.json [2025-02-24 18:45:05] -rwx------ 1 boincer umatlas 35865 Feb 22 01:42 runpilot2-wrapper.sh [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 100 Feb 22 07:02 wrapper_26015_x86_64-pc-linux-gnu [2025-02-24 18:45:05] -rwxr-xr-x 1 boincer umatlas 7986 Feb 22 07:02 run_atlas [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 105 Feb 22 07:02 job.xml [2025-02-24 18:45:05] -rw-r--r-- 3 boincer umatlas 350014465 Feb 22 07:02 EVNT.43092790._000015.pool.root.1 [2025-02-24 18:45:05] -rw-r--r-- 3 boincer umatlas 350014465 Feb 22 07:02 ATLAS.root_0 [2025-02-24 18:45:05] -rw-r--r-- 2 boincer umatlas 503559 Feb 22 07:02 input.tar.gz [2025-02-24 18:45:05] -rw-r--r-- 2 boincer umatlas 17569 Feb 22 07:02 start_atlas.sh [2025-02-24 18:45:05] drwxrwx--x 2 boincer umatlas 4096 Feb 22 07:02 shared [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 0 Feb 22 07:02 boinc_lockfile [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 2554 Feb 22 18:47 pandaJob.out [2025-02-24 18:45:05] -rw------- 1 boincer umatlas 467 Feb 22 18:47 setup.sh.local [2025-02-24 18:45:05] -rw------- 1 boincer umatlas 987744 Feb 22 18:47 agis_schedconf.cvmfs.json [2025-02-24 18:45:05] -rw------- 1 boincer umatlas 1582569 Feb 22 18:47 agis_ddmendpoints.agis.ALL.json [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 6038 Feb 24 18:40 init_data.xml [2025-02-24 18:45:05] -rw------- 1 boincer umatlas 921 Feb 24 18:43 heartbeat.json [2025-02-24 18:45:05] -rw------- 1 boincer umatlas 96 Feb 24 18:43 pilot_heartbeat.json [2025-02-24 18:45:05] drwxrwx--- 2 boincer umatlas 4096 Feb 24 18:44 PanDA_Pilot-6525230404 [2025-02-24 18:45:05] -rw------- 1 boincer umatlas 1065 Feb 24 18:44 memory_monitor_summary.json [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 534 Feb 24 18:44 boinc_task_state.xml [2025-02-24 18:45:05] -rw------- 1 boincer umatlas 32722390 Feb 24 18:45 pilotlog.txt [2025-02-24 18:45:05] -rw------- 1 boincer umatlas 32754461 Feb 24 18:45 log.43092792._002813.job.log.1 [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 571 Feb 24 18:45 runtime_log [2025-02-24 18:45:05] -rw------- 1 boincer umatlas 32768000 Feb 24 18:45 result.tar.gz [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 10834 Feb 24 18:45 runtime_log.err [2025-02-24 18:45:05] -rw------- 1 boincer umatlas 784 Feb 24 18:45 GzvMDmyOy56nsSi4ap6QjLDmwznN0nGgGQJmkKkKDmTctKDmn3sbZm.diag [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 8192 Feb 24 18:45 boinc_mmap_file [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 30 Feb 24 18:45 wrapper_checkpoint.txt [2025-02-24 18:45:05] -rw-r--r-- 1 boincer umatlas 27806 Feb 24 18:45 stderr.txt [2025-02-24 18:45:05] No HITS result produced [2025-02-24 18:45:05] *** Contents of shared directory: *** [2025-02-24 18:45:05] total 374328 [2025-02-24 18:45:05] -rw-r--r-- 3 boincer umatlas 350014465 Feb 22 07:02 ATLAS.root_0 [2025-02-24 18:45:05] -rw-r--r-- 2 boincer umatlas 503559 Feb 22 07:02 input.tar.gz [2025-02-24 18:45:05] -rw-r--r-- 2 boincer umatlas 17569 Feb 22 07:02 start_atlas.sh [2025-02-24 18:45:05] -rw------- 1 boincer umatlas 32768000 Feb 24 18:45 result.tar.gz 18:45:06 (36366): run_atlas exited; CPU time 7692.725359 18:45:06 (36366): called boinc_finish(0) </stderr_txt> ]]>
©2025 CERN