Thread 'ATLAS long simulation 1.00'

Author	Message
tazzduke Send message Joined: 24 Jun 10 Posts: 45 Credit: 7,345,718 RAC: 8,169	Message 44594 - Posted: 29 Mar 2021, 10:47:44 UTC - in response to Message 44593. Well that is strange, as I do have a Squid Proxy setup on the following machine https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10676832 Which has processed the following task successfully. https://lhcathome.cern.ch/lhcathome/result.php?resultid=306228088, and this is in the stderr file. 2021-03-29 00:30:38 (1800): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2021-03-29 00:30:38 (1800): Guest Log: 2.4.4.0 3747 2 25840 14801 2 1 1243074 4096000 2 65024 0 2 100 0 0 http://s1asgc-cvmfs.openhtc.io:8080/cvmfs/grid.cern.ch http://192.168.1.100:3128 1 This machine that is specifically setup to run ATLAS native workunits - https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10667767 which processed the Atlas Long Simulation, has been switched over to run 1 x CMS, and yes the Squid Proxy is detected running the CMS (VM). But I will look to see if I have missed something with setting up in regards to Linux Native Workunits and squid proxies. Cheers ID: 44594 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2757 Credit: 304,456,598 RAC: 96,519	Message 44595 - Posted: 29 Mar 2021, 11:08:57 UTC - in response to Message 44594. OK, I see. It only affects your local CVMFS client on the Linux box. Tasks running in a CERN VM (like CMS) forward a proxy set for BOINC to the VM. Native tasks have to rely on the CVMFS configuration made by the local admin. Just configure that CVMFS to use your existing proxy. If there are more questions they should be discussed in a separate thread since it is OT here. ID: 44595 · Reply Quote

tazzduke Send message Joined: 24 Jun 10 Posts: 45 Credit: 7,345,718 RAC: 8,169	Message 44596 - Posted: 29 Mar 2021, 11:13:06 UTC - in response to Message 44594. Hi All Found out the problem, I forgot to update the default.local file in /etc/cvmfs and then do the config_reload command and then I did the config_config stat command which showed the using local proxy. Now have downloaded 1 more workunit and will see if I am on the right track. Cheers ID: 44596 · Reply Quote

tazzduke Send message Joined: 24 Jun 10 Posts: 45 Credit: 7,345,718 RAC: 8,169	Message 44597 - Posted: 29 Mar 2021, 11:14:31 UTC - in response to Message 44595. OK, I see. It only affects your local CVMFS client on the Linux box. Tasks running in a CERN VM (like CMS) forward a proxy set for BOINC to the VM. Native tasks have to rely on the CVMFS configuration made by the local admin. Just configure that CVMFS to use your existing proxy. If there are more questions they should be discussed in a separate thread since it is OT here. Okay and noted. Again thankyou for your insight computezrmle, you have helped me sort out the fine tuning of running these tasks. Cheers ID: 44597 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 11,538	Message 44652 - Posted: 3 Apr 2021, 23:23:47 UTC Last modified: 3 Apr 2021, 23:52:31 UTC Both are CentOS8.VM AMD Ryzen 9 3950X with 6 CPU's and Multithreading https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=159146452 21:27:05 (1748490): run_atlas exited; CPU time 213425.835602 AMD Ryzen 7 2700 with 4 CPU's - without Multithreading https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=159145628 [2021-04-03 14:31:22] ryzcos8 [2021-04-03 14:31:22] Singularity works [2021-04-03 14:32:13] Set ATHENA_PROC_NUMBER=4 [2021-04-03 14:32:13] Starting ATLAS job with PandaID=5014719677 [2021-04-03 14:32:13] Running command: /usr/bin/singularity exec --pwd /var/lib/boinc/slots/0 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh [2021-04-04 00:55:22] * The last 200 lines of the pilot log: * [2021-04-04 00:55:22] "cpuTimeTotal": 143039, [2021-04-04 00:55:22] "externalCpuTime": 189, [2021-04-04 00:55:22] "processedEvents": 1000, [2021-04-04 00:55:22] "trfPredata": null, [2021-04-04 00:55:22] "wallTime": 37270 AMD Ryzen 7 2700 with 6 CPU's and Multithreading [2021-03-27 08:23:09] ryzcos8 [2021-03-27 08:23:09] Singularity works [2021-03-27 08:24:00] Set ATHENA_PROC_NUMBER=6 [2021-03-27 08:24:00] Starting ATLAS job with PandaID=5009743354 [2021-03-27 08:24:00] Running command: /usr/bin/singularity exec --pwd /var/lib/boinc/slots/0 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh [2021-03-27 19:28:51] * The last 200 lines of the pilot log: * [2021-03-27 19:28:51] "cpuTime": 20, [2021-03-27 19:28:51] "cpuTimeTotal": 230727, [2021-03-27 19:28:51] "externalCpuTime": 403, [2021-03-27 19:28:51] "processedEvents": 1000, [2021-03-27 19:28:51] "trfPredata": null, [2021-03-27 19:28:51] "wallTime": 39728 ID: 44652 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 11,538	Message 44656 - Posted: 5 Apr 2021, 10:14:17 UTC Testing CentOS8.VM with 3 CPU's without Multithreading. Task was downloaded, but is not listed in the Folders of Atlas-Tasks in the Website of the User. Knowing well, Atlas-long is only running with 4 or more CPU, but no Info for the User what going wrong. ID: 44656 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 11,538	Message 44661 - Posted: 6 Apr 2021, 13:46:55 UTC The number of Tasks for the longrunner is dropping to ZERO. ID: 44661 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 44668 - Posted: 7 Apr 2021, 12:51:54 UTC There are a lot more tasks available now. These are also now real tasks instead of tests, i.e. the output will be used for science. I know the credits awarded for these tasks are a bit strange... my own tasks' credit dropped from 1000 to 100 per task. I am trying to find out how to improve this but it may be a case of gathering more statistics until the credit settles down. ID: 44668 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 44681 - Posted: 8 Apr 2021, 16:34:18 UTC I just released v1.02 which has some minor improvements to CVMFS checks at the start of the task. ID: 44681 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 11,538	Message 44682 - Posted: 8 Apr 2021, 17:47:17 UTC - in response to Message 44681. CentOS8-VM multithreading with 6 CPU's. https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=160542243 Filemanager was showing dev/loopx with filenumber up to more than 60. Is this also solved? squashfs is mounting this device, but not delisted sometimes. ID: 44682 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 11,538	Message 44711 - Posted: 12 Apr 2021, 6:01:46 UTC CentOS8-VM without Multithread need now 5 Min. instead of 10 Min. in the past for the starting phase. ID: 44711 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 11,538	Message 44732 - Posted: 14 Apr 2021, 4:45:12 UTC The Longrunner exited last night all with: 195 (0x000000C3) EXIT_CHILD_FAILED example:https://lhcathome.cern.ch/lhcathome/result.php?resultid=311854099 ID: 44732 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 11,538	Message 44923 - Posted: 9 May 2021, 23:10:13 UTC - in response to Message 44732. The last two longrunner, don't produced a HITS-File: https://lhcathome.cern.ch/lhcathome/result.php?resultid=316201737 [2021-05-09 01:46:42] 2021-05-08 23:46:42,063 [wrapper] Pilot exit status: 0 [2021-05-09 01:46:42] 2021-05-08 23:46:42,073 [wrapper] pandaids: 5048553932 [2021-05-09 01:46:42] 2021-05-08 23:46:42,078 [wrapper] apfmon messages muted [2021-05-09 01:46:42] 2021-05-08 23:46:42,081 [wrapper] Test setup, not cleaning [2021-05-09 01:46:42] 2021-05-08 23:46:42,083 [wrapper] ==== wrapper stdout END ==== [2021-05-09 01:46:42] 2021-05-08 23:46:42,086 [wrapper] ==== wrapper stderr END ==== [2021-05-09 01:46:42] 2021-05-08 23:46:42,091 [wrapper] wrapperexiting ec=0, duration=48516 [2021-05-09 01:46:42] 2021-05-08 23:46:42,094 [wrapper] apfmon messages muted [2021-05-09 01:46:42] * Error codes and diagnostics * [2021-05-09 01:46:42] "exeErrorCode": 65, [2021-05-09 01:46:42] "exeErrorDiag": "Non-zero return code from HITSMergeAthenaMP0 (65); Logfile error in log.HITSMergeAthenaMP0: \"StreamHITS FATAL Check number of writes failed. See messages above to identify which continer is not always written\"", [2021-05-09 01:46:42] "pilotErrorCode": 1165, [2021-05-09 01:46:42] "pilotErrorDiag": "Local output file is missing", [2021-05-09 01:46:42] * Listing of results directory * [2021-05-09 01:46:42] insgesamt 461536 https://lhcathome.cern.ch/lhcathome/result.php?resultid=316228689 [2021-05-09 17:19:46] 2021-05-09 15:19:45,454 [wrapper] wrapperexiting ec=0, duration=48214 [2021-05-09 17:19:46] 2021-05-09 15:19:45,457 [wrapper] apfmon messages muted [2021-05-09 17:19:46] * Error codes and diagnostics * [2021-05-09 17:19:46] "exeErrorCode": 65, [2021-05-09 17:19:46] "exeErrorDiag": "Non-zero return code from HITSMergeAthenaMP0 (65); Logfile error in log.HITSMergeAthenaMP0: \"StreamHITS FATAL Check number of writes failed. See messages above to identify which continer is not always written\"", [2021-05-09 17:19:46] "pilotErrorCode": 1165, [2021-05-09 17:19:46] "pilotErrorDiag": "Local output file is missing", [2021-05-09 17:19:46] * Listing of results directory * [2021-05-09 17:19:46] insgesamt 467072 ID: 44923 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 11,538	Message 44992 - Posted: 21 May 2021, 8:12:10 UTC Is it a better way to get Boinc-Credit for the longrunner from the CPU-Seconds used for? 20:17:44 (766501): run_atlas exited; CPU time 281469.346031 20:17:44 (766501): called boinc_finish(0) ID: 44992 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 949 Credit: 784,416,577 RAC: 111,587	Message 45013 - Posted: 26 May 2021, 19:53:35 UTC Last modified: 26 May 2021, 20:02:14 UTC I tried to get one but saw: 6510 LHC@home 05/26/21 21:50:25 This computer has finished a daily quota of 1 tasks I don't see that I ever got one however? ID: 45013 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2757 Credit: 304,456,598 RAC: 96,519	Message 45014 - Posted: 26 May 2021, 20:13:39 UTC - in response to Message 45013. It might be a misleading error message. ATLAS long runs only on Linux but there's no Linux host on your computer list. ID: 45014 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 11,538	Message 45015 - Posted: 26 May 2021, 20:15:43 UTC - in response to Message 45013. Longrunner are only Linux-VM and needing Test-Application in prefs. ID: 45015 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 949 Credit: 784,416,577 RAC: 111,587	Message 45018 - Posted: 27 May 2021, 17:36:05 UTC - in response to Message 45014. That would explain it, I thought it was a VM not native. ID: 45018 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 11,538	Message 45050 - Posted: 6 Jun 2021, 1:51:35 UTC Longrunners are stucking for download. ID: 45050 · Reply Quote

David Cameron Project administrator Project developer Project scientist Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0	Message 45056 - Posted: 16 Jun 2021, 9:24:07 UTC Hi all, I have paused the submission of long tasks for the moment, since there are very little hosts running them and the large cluster running them previously is no longer running BOINC. But we may bring the long tasks back in the future if there is demand for them. Thanks to everyone who helped testing and running these tasks. David ID: 45056 · Reply Quote