Message boards : ATLAS application : ATLAS long simulation 1.00
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 39
Credit: 4,971,347
RAC: 4,992
Message 44594 - Posted: 29 Mar 2021, 10:47:44 UTC - in response to Message 44593.  

Well that is strange, as I do have a Squid Proxy setup on the following machine

https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10676832

Which has processed the following task successfully.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=306228088, and this is in the stderr file.


2021-03-29 00:30:38 (1800): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE

2021-03-29 00:30:38 (1800): Guest Log: 2.4.4.0 3747 2 25840 14801 2 1 1243074 4096000 2 65024 0 2 100 0 0 http://s1asgc-cvmfs.openhtc.io:8080/cvmfs/grid.cern.ch http://192.168.1.100:3128 1


This machine that is specifically setup to run ATLAS native workunits - https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10667767

which processed the Atlas Long Simulation, has been switched over to run 1 x CMS, and yes the Squid Proxy is detected running the CMS (VM).

But I will look to see if I have missed something with setting up in regards to Linux Native Workunits and squid proxies.

Cheers
ID: 44594 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,922,591
RAC: 137,979
Message 44595 - Posted: 29 Mar 2021, 11:08:57 UTC - in response to Message 44594.  

OK, I see.
It only affects your local CVMFS client on the Linux box.
Tasks running in a CERN VM (like CMS) forward a proxy set for BOINC to the VM.

Native tasks have to rely on the CVMFS configuration made by the local admin.
Just configure that CVMFS to use your existing proxy.

If there are more questions they should be discussed in a separate thread since it is OT here.
ID: 44595 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 39
Credit: 4,971,347
RAC: 4,992
Message 44596 - Posted: 29 Mar 2021, 11:13:06 UTC - in response to Message 44594.  

Hi All

Found out the problem, I forgot to update the default.local file in /etc/cvmfs and then do the config_reload command and then I did the config_config stat command which showed the using local proxy.

Now have downloaded 1 more workunit and will see if I am on the right track.

Cheers
ID: 44596 · Report as offensive     Reply Quote
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 39
Credit: 4,971,347
RAC: 4,992
Message 44597 - Posted: 29 Mar 2021, 11:14:31 UTC - in response to Message 44595.  

OK, I see.
It only affects your local CVMFS client on the Linux box.
Tasks running in a CERN VM (like CMS) forward a proxy set for BOINC to the VM.

Native tasks have to rely on the CVMFS configuration made by the local admin.
Just configure that CVMFS to use your existing proxy.

If there are more questions they should be discussed in a separate thread since it is OT here.


Okay and noted.

Again thankyou for your insight computezrmle, you have helped me sort out the fine tuning of running these tasks.

Cheers
ID: 44597 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,089
RAC: 103,567
Message 44652 - Posted: 3 Apr 2021, 23:23:47 UTC
Last modified: 3 Apr 2021, 23:52:31 UTC

Both are CentOS8.VM
AMD Ryzen 9 3950X with 6 CPU's and Multithreading https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=159146452
21:27:05 (1748490): run_atlas exited; CPU time 213425.835602

AMD Ryzen 7 2700 with 4 CPU's - without Multithreading https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=159145628
[2021-04-03 14:31:22] ryzcos8
[2021-04-03 14:31:22] Singularity works
[2021-04-03 14:32:13] Set ATHENA_PROC_NUMBER=4
[2021-04-03 14:32:13] Starting ATLAS job with PandaID=5014719677
[2021-04-03 14:32:13] Running command: /usr/bin/singularity exec --pwd /var/lib/boinc/slots/0 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh
[2021-04-04 00:55:22] *** The last 200 lines of the pilot log: ***
[2021-04-04 00:55:22] "cpuTimeTotal": 143039,
[2021-04-04 00:55:22] "externalCpuTime": 189,
[2021-04-04 00:55:22] "processedEvents": 1000,
[2021-04-04 00:55:22] "trfPredata": null,
[2021-04-04 00:55:22] "wallTime": 37270
AMD Ryzen 7 2700 with 6 CPU's and Multithreading
[2021-03-27 08:23:09] ryzcos8
[2021-03-27 08:23:09] Singularity works
[2021-03-27 08:24:00] Set ATHENA_PROC_NUMBER=6
[2021-03-27 08:24:00] Starting ATLAS job with PandaID=5009743354
[2021-03-27 08:24:00] Running command: /usr/bin/singularity exec --pwd /var/lib/boinc/slots/0 -B /cvmfs,/var /cvmfs/atlas.cern.ch/repo/containers/images/singularity/x86_64-centos7.img sh start_atlas.sh
[2021-03-27 19:28:51] *** The last 200 lines of the pilot log: ***
[2021-03-27 19:28:51] "cpuTime": 20,
[2021-03-27 19:28:51] "cpuTimeTotal": 230727,
[2021-03-27 19:28:51] "externalCpuTime": 403,
[2021-03-27 19:28:51] "processedEvents": 1000,
[2021-03-27 19:28:51] "trfPredata": null,
[2021-03-27 19:28:51] "wallTime": 39728
ID: 44652 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,089
RAC: 103,567
Message 44656 - Posted: 5 Apr 2021, 10:14:17 UTC

Testing CentOS8.VM with 3 CPU's without Multithreading.
Task was downloaded, but is not listed in the Folders of Atlas-Tasks in the Website of the User.
Knowing well, Atlas-long is only running with 4 or more CPU, but no Info for the User what going wrong.
ID: 44656 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,089
RAC: 103,567
Message 44661 - Posted: 6 Apr 2021, 13:46:55 UTC

The number of Tasks for the longrunner is dropping to ZERO.
ID: 44661 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 44668 - Posted: 7 Apr 2021, 12:51:54 UTC

There are a lot more tasks available now. These are also now real tasks instead of tests, i.e. the output will be used for science.

I know the credits awarded for these tasks are a bit strange... my own tasks' credit dropped from 1000 to 100 per task. I am trying to find out how to improve this but it may be a case of gathering more statistics until the credit settles down.
ID: 44668 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 44681 - Posted: 8 Apr 2021, 16:34:18 UTC

I just released v1.02 which has some minor improvements to CVMFS checks at the start of the task.
ID: 44681 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,089
RAC: 103,567
Message 44682 - Posted: 8 Apr 2021, 17:47:17 UTC - in response to Message 44681.  

CentOS8-VM multithreading with 6 CPU's.
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=160542243
Filemanager was showing dev/loopx with filenumber up to more than 60.
Is this also solved? squashfs is mounting this device, but not delisted sometimes.
ID: 44682 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,089
RAC: 103,567
Message 44711 - Posted: 12 Apr 2021, 6:01:46 UTC

CentOS8-VM without Multithread need now 5 Min. instead of 10 Min. in the past for the starting phase.
ID: 44711 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,089
RAC: 103,567
Message 44732 - Posted: 14 Apr 2021, 4:45:12 UTC

The Longrunner exited last night all with:
195 (0x000000C3) EXIT_CHILD_FAILED
example:https://lhcathome.cern.ch/lhcathome/result.php?resultid=311854099
ID: 44732 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,089
RAC: 103,567
Message 44923 - Posted: 9 May 2021, 23:10:13 UTC - in response to Message 44732.  

The last two longrunner, don't produced a HITS-File:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=316201737
[2021-05-09 01:46:42] 2021-05-08 23:46:42,063 [wrapper] Pilot exit status: 0
[2021-05-09 01:46:42] 2021-05-08 23:46:42,073 [wrapper] pandaids: 5048553932
[2021-05-09 01:46:42] 2021-05-08 23:46:42,078 [wrapper] apfmon messages muted
[2021-05-09 01:46:42] 2021-05-08 23:46:42,081 [wrapper] Test setup, not cleaning
[2021-05-09 01:46:42] 2021-05-08 23:46:42,083 [wrapper] ==== wrapper stdout END ====
[2021-05-09 01:46:42] 2021-05-08 23:46:42,086 [wrapper] ==== wrapper stderr END ====
[2021-05-09 01:46:42] 2021-05-08 23:46:42,091 [wrapper] wrapperexiting ec=0, duration=48516
[2021-05-09 01:46:42] 2021-05-08 23:46:42,094 [wrapper] apfmon messages muted
[2021-05-09 01:46:42] *** Error codes and diagnostics ***
[2021-05-09 01:46:42] "exeErrorCode": 65,
[2021-05-09 01:46:42] "exeErrorDiag": "Non-zero return code from HITSMergeAthenaMP0 (65); Logfile error in log.HITSMergeAthenaMP0: \"StreamHITS FATAL Check number of writes failed. See messages above to identify which continer is not always written\"",
[2021-05-09 01:46:42] "pilotErrorCode": 1165,
[2021-05-09 01:46:42] "pilotErrorDiag": "Local output file is missing",

[2021-05-09 01:46:42] *** Listing of results directory ***
[2021-05-09 01:46:42] insgesamt 461536

https://lhcathome.cern.ch/lhcathome/result.php?resultid=316228689
[2021-05-09 17:19:46] 2021-05-09 15:19:45,454 [wrapper] wrapperexiting ec=0, duration=48214
[2021-05-09 17:19:46] 2021-05-09 15:19:45,457 [wrapper] apfmon messages muted
[2021-05-09 17:19:46] *** Error codes and diagnostics ***
[2021-05-09 17:19:46] "exeErrorCode": 65,
[2021-05-09 17:19:46] "exeErrorDiag": "Non-zero return code from HITSMergeAthenaMP0 (65); Logfile error in log.HITSMergeAthenaMP0: \"StreamHITS FATAL Check number of writes failed. See messages above to identify which continer is not always written\"",
[2021-05-09 17:19:46] "pilotErrorCode": 1165,
[2021-05-09 17:19:46] "pilotErrorDiag": "Local output file is missing",
[2021-05-09 17:19:46] *** Listing of results directory ***
[2021-05-09 17:19:46] insgesamt 467072
ID: 44923 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,089
RAC: 103,567
Message 44992 - Posted: 21 May 2021, 8:12:10 UTC

Is it a better way to get Boinc-Credit for the longrunner from the CPU-Seconds used for?
20:17:44 (766501): run_atlas exited; CPU time 281469.346031
20:17:44 (766501): called boinc_finish(0)
ID: 44992 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,721,942
RAC: 234,286
Message 45013 - Posted: 26 May 2021, 19:53:35 UTC
Last modified: 26 May 2021, 20:02:14 UTC

I tried to get one but saw:

6510 LHC@home 05/26/21 21:50:25 This computer has finished a daily quota of 1 tasks


I don't see that I ever got one however?

ID: 45013 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,922,591
RAC: 137,979
Message 45014 - Posted: 26 May 2021, 20:13:39 UTC - in response to Message 45013.  

It might be a misleading error message.
ATLAS long runs only on Linux but there's no Linux host on your computer list.
ID: 45014 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,089
RAC: 103,567
Message 45015 - Posted: 26 May 2021, 20:15:43 UTC - in response to Message 45013.  

Longrunner are only Linux-VM and needing Test-Application in prefs.
ID: 45015 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,721,942
RAC: 234,286
Message 45018 - Posted: 27 May 2021, 17:36:05 UTC - in response to Message 45014.  

That would explain it, I thought it was a VM not native.
ID: 45018 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,091,089
RAC: 103,567
Message 45050 - Posted: 6 Jun 2021, 1:51:35 UTC

Longrunners are stucking for download.
ID: 45050 · Report as offensive     Reply Quote
David Cameron
Project administrator
Project developer
Project scientist

Send message
Joined: 13 May 14
Posts: 387
Credit: 15,314,184
RAC: 0
Message 45056 - Posted: 16 Jun 2021, 9:24:07 UTC

Hi all,

I have paused the submission of long tasks for the moment, since there are very little hosts running them and the large cluster running them previously is no longer running BOINC. But we may bring the long tasks back in the future if there is demand for them. Thanks to everyone who helped testing and running these tasks.

David
ID: 45056 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : ATLAS application : ATLAS long simulation 1.00


©2024 CERN