Message boards :
ATLAS application :
ATLAS long simulation 1.00
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 24 Jun 10 Posts: 42 Credit: 5,351,648 RAC: 18,544 |
Well that is strange, as I do have a Squid Proxy setup on the following machine https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10676832 Which has processed the following task successfully. https://lhcathome.cern.ch/lhcathome/result.php?resultid=306228088, and this is in the stderr file. 2021-03-29 00:30:38 (1800): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2021-03-29 00:30:38 (1800): Guest Log: 2.4.4.0 3747 2 25840 14801 2 1 1243074 4096000 2 65024 0 2 100 0 0 http://s1asgc-cvmfs.openhtc.io:8080/cvmfs/grid.cern.ch http://192.168.1.100:3128 1 This machine that is specifically setup to run ATLAS native workunits - https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10667767 which processed the Atlas Long Simulation, has been switched over to run 1 x CMS, and yes the Squid Proxy is detected running the CMS (VM). But I will look to see if I have missed something with setting up in regards to Linux Native Workunits and squid proxies. Cheers |
Send message Joined: 15 Jun 08 Posts: 2418 Credit: 226,711,892 RAC: 130,560 |
OK, I see. It only affects your local CVMFS client on the Linux box. Tasks running in a CERN VM (like CMS) forward a proxy set for BOINC to the VM. Native tasks have to rely on the CVMFS configuration made by the local admin. Just configure that CVMFS to use your existing proxy. If there are more questions they should be discussed in a separate thread since it is OT here. |
Send message Joined: 24 Jun 10 Posts: 42 Credit: 5,351,648 RAC: 18,544 |
Hi All Found out the problem, I forgot to update the default.local file in /etc/cvmfs and then do the config_reload command and then I did the config_config stat command which showed the using local proxy. Now have downloaded 1 more workunit and will see if I am on the right track. Cheers |
Send message Joined: 24 Jun 10 Posts: 42 Credit: 5,351,648 RAC: 18,544 |
OK, I see. Okay and noted. Again thankyou for your insight computezrmle, you have helped me sort out the fine tuning of running these tasks. Cheers |
Send message Joined: 2 May 07 Posts: 2108 Credit: 159,820,112 RAC: 106,484 |
Both are CentOS8.VM AMD Ryzen 9 3950X with 6 CPU's and Multithreading https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=159146452 21:27:05 (1748490): run_atlas exited; CPU time 213425.835602 AMD Ryzen 7 2700 with 4 CPU's - without Multithreading https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=159145628 [2021-04-03 14:31:22] ryzcos8 [2021-04-03 14:31:22] Singularity works [2021-04-03 14:32:13] Set ATHENA_PROC_NUMBER=4 [2021-04-03 14:32:13] Starting ATLAS job with PandaID=5014719677 [2021-04-03 14:32:13] Running command: /usr/bin/singularity exec --pwd /var/lib/boinc/slots/0 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh [2021-04-04 00:55:22] *** The last 200 lines of the pilot log: *** [2021-04-04 00:55:22] "cpuTimeTotal": 143039, [2021-04-04 00:55:22] "externalCpuTime": 189, [2021-04-04 00:55:22] "processedEvents": 1000, [2021-04-04 00:55:22] "trfPredata": null, [2021-04-04 00:55:22] "wallTime": 37270 AMD Ryzen 7 2700 with 6 CPU's and Multithreading [2021-03-27 08:23:09] ryzcos8 [2021-03-27 08:23:09] Singularity works [2021-03-27 08:24:00] Set ATHENA_PROC_NUMBER=6 [2021-03-27 08:24:00] Starting ATLAS job with PandaID=5009743354 [2021-03-27 08:24:00] Running command: /usr/bin/singularity exec --pwd /var/lib/boinc/slots/0 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh [2021-03-27 19:28:51] *** The last 200 lines of the pilot log: *** [2021-03-27 19:28:51] "cpuTime": 20, [2021-03-27 19:28:51] "cpuTimeTotal": 230727, [2021-03-27 19:28:51] "externalCpuTime": 403, [2021-03-27 19:28:51] "processedEvents": 1000, [2021-03-27 19:28:51] "trfPredata": null, [2021-03-27 19:28:51] "wallTime": 39728 |
Send message Joined: 2 May 07 Posts: 2108 Credit: 159,820,112 RAC: 106,484 |
Testing CentOS8.VM with 3 CPU's without Multithreading. Task was downloaded, but is not listed in the Folders of Atlas-Tasks in the Website of the User. Knowing well, Atlas-long is only running with 4 or more CPU, but no Info for the User what going wrong. |
Send message Joined: 2 May 07 Posts: 2108 Credit: 159,820,112 RAC: 106,484 |
The number of Tasks for the longrunner is dropping to ZERO. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
There are a lot more tasks available now. These are also now real tasks instead of tests, i.e. the output will be used for science. I know the credits awarded for these tasks are a bit strange... my own tasks' credit dropped from 1000 to 100 per task. I am trying to find out how to improve this but it may be a case of gathering more statistics until the credit settles down. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
I just released v1.02 which has some minor improvements to CVMFS checks at the start of the task. |
Send message Joined: 2 May 07 Posts: 2108 Credit: 159,820,112 RAC: 106,484 |
CentOS8-VM multithreading with 6 CPU's. https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=160542243 Filemanager was showing dev/loopx with filenumber up to more than 60. Is this also solved? squashfs is mounting this device, but not delisted sometimes. |
Send message Joined: 2 May 07 Posts: 2108 Credit: 159,820,112 RAC: 106,484 |
CentOS8-VM without Multithread need now 5 Min. instead of 10 Min. in the past for the starting phase. |
Send message Joined: 2 May 07 Posts: 2108 Credit: 159,820,112 RAC: 106,484 |
The Longrunner exited last night all with: 195 (0x000000C3) EXIT_CHILD_FAILED example:https://lhcathome.cern.ch/lhcathome/result.php?resultid=311854099 |
Send message Joined: 2 May 07 Posts: 2108 Credit: 159,820,112 RAC: 106,484 |
The last two longrunner, don't produced a HITS-File: https://lhcathome.cern.ch/lhcathome/result.php?resultid=316201737 [2021-05-09 01:46:42] 2021-05-08 23:46:42,063 [wrapper] Pilot exit status: 0 [2021-05-09 01:46:42] 2021-05-08 23:46:42,073 [wrapper] pandaids: 5048553932 [2021-05-09 01:46:42] 2021-05-08 23:46:42,078 [wrapper] apfmon messages muted [2021-05-09 01:46:42] 2021-05-08 23:46:42,081 [wrapper] Test setup, not cleaning [2021-05-09 01:46:42] 2021-05-08 23:46:42,083 [wrapper] ==== wrapper stdout END ==== [2021-05-09 01:46:42] 2021-05-08 23:46:42,086 [wrapper] ==== wrapper stderr END ==== [2021-05-09 01:46:42] 2021-05-08 23:46:42,091 [wrapper] wrapperexiting ec=0, duration=48516 [2021-05-09 01:46:42] 2021-05-08 23:46:42,094 [wrapper] apfmon messages muted [2021-05-09 01:46:42] *** Error codes and diagnostics *** [2021-05-09 01:46:42] "exeErrorCode": 65, [2021-05-09 01:46:42] "exeErrorDiag": "Non-zero return code from HITSMergeAthenaMP0 (65); Logfile error in log.HITSMergeAthenaMP0: \"StreamHITS FATAL Check number of writes failed. See messages above to identify which continer is not always written\"", [2021-05-09 01:46:42] "pilotErrorCode": 1165, [2021-05-09 01:46:42] "pilotErrorDiag": "Local output file is missing", [2021-05-09 01:46:42] *** Listing of results directory *** [2021-05-09 01:46:42] insgesamt 461536 https://lhcathome.cern.ch/lhcathome/result.php?resultid=316228689 [2021-05-09 17:19:46] 2021-05-09 15:19:45,454 [wrapper] wrapperexiting ec=0, duration=48214 [2021-05-09 17:19:46] 2021-05-09 15:19:45,457 [wrapper] apfmon messages muted [2021-05-09 17:19:46] *** Error codes and diagnostics *** [2021-05-09 17:19:46] "exeErrorCode": 65, [2021-05-09 17:19:46] "exeErrorDiag": "Non-zero return code from HITSMergeAthenaMP0 (65); Logfile error in log.HITSMergeAthenaMP0: \"StreamHITS FATAL Check number of writes failed. See messages above to identify which continer is not always written\"", [2021-05-09 17:19:46] "pilotErrorCode": 1165, [2021-05-09 17:19:46] "pilotErrorDiag": "Local output file is missing", [2021-05-09 17:19:46] *** Listing of results directory *** [2021-05-09 17:19:46] insgesamt 467072 |
Send message Joined: 2 May 07 Posts: 2108 Credit: 159,820,112 RAC: 106,484 |
Is it a better way to get Boinc-Credit for the longrunner from the CPU-Seconds used for? 20:17:44 (766501): run_atlas exited; CPU time 281469.346031 20:17:44 (766501): called boinc_finish(0) |
Send message Joined: 27 Sep 08 Posts: 808 Credit: 653,010,032 RAC: 277,770 |
|
Send message Joined: 15 Jun 08 Posts: 2418 Credit: 226,711,892 RAC: 130,560 |
It might be a misleading error message. ATLAS long runs only on Linux but there's no Linux host on your computer list. |
Send message Joined: 2 May 07 Posts: 2108 Credit: 159,820,112 RAC: 106,484 |
Longrunner are only Linux-VM and needing Test-Application in prefs. |
Send message Joined: 27 Sep 08 Posts: 808 Credit: 653,010,032 RAC: 277,770 |
That would explain it, I thought it was a VM not native. |
Send message Joined: 2 May 07 Posts: 2108 Credit: 159,820,112 RAC: 106,484 |
Longrunners are stucking for download. |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
Hi all, I have paused the submission of long tasks for the moment, since there are very little hosts running them and the large cluster running them previously is no longer running BOINC. But we may bring the long tasks back in the future if there is demand for them. Thanks to everyone who helped testing and running these tasks. David |
©2024 CERN