Message boards :
ATLAS application :
HTTP-Proxy Setting(s)
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,369,412 RAC: 10,065 |
I'm wondering about following: My clients have no HTTP-Proxy, the old Proxy-Setting is simpy deactivated: So, I was wondering why I found this snippet from a boot-sequence of an Atlas-Task: 2017-03-14 18:58:32 (6308): VM state change detected. (old = 'paused', new = 'running') 2017-03-14 18:58:42 (6308): Guest Log: copied the webapp to /var/www 2017-03-14 18:58:42 (6308): Guest Log: set up http_proxy http://squid:8080 2017-03-14 18:58:42 (6308): Guest Log: ATHENA_PROC_NUMBER=5 2017-03-14 18:58:42 (6308): Guest Log: Starting ATLAS job. (PandaID=3281037544 taskID=10947180) I'm wondering because the Host seems to do fine with Atlas Supporting BOINC, a great concept ! |
Send message Joined: 13 May 14 Posts: 387 Credit: 15,314,184 RAC: 0 |
Do you have something in BOINC configuration setting a proxy? The ATLAS scripts use the information in the init_data.xml file which is created by BOINC Client for each WU, by reading information in the client configuration. See here https://boinc.berkeley.edu/wiki/Client_configuration The section <proxy_info> contains the settings which are put in init_data.xml and used by ATLAS. |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 222,956,149 RAC: 136,964 |
Once set via GUI or configuration file the BOINC client stores the proxy setting in client_state.xml <use_http_proxy/> # this tag is deleted if you switch the proxy usage off <http_server_name>proxy.example.com</http_server_name> # this tag remains filled <http_server_port>3128</http_server_port> # this tag remains filled If ATLAS reads only <http_server_name> and <http_server_port> but ignores a missing <use_http_proxy/> it will try to contact the project via proxy and will fallback to a direct connection if the proxy is down. |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,369,412 RAC: 10,065 |
Okay, this seems to be a minor bug. As in my BOINC-Settings HTTP-Proxy was deactivated, the client should give this information correct to init_data.xml And / or the VirtualBox / Atlas-Application should interpret this correct. I can not decide who does it wrong but someone should take a closer look and fix it.
Supporting BOINC, a great concept ! |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 222,956,149 RAC: 136,964 |
To be honest I´m not really sure if the proxy setting works from inside an ATLAS VM as I redirect all HTTP traffic from the VM to CERN through my proxy using a set of netfilter rules. On the other hand: If your VMs have a proxy set and the tasks work although it is offline, the fallback works or the proxy setting is simply ignored. |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
Since my internet bandwidth is pretty low, I am trying to/thinking about using a proxy server. Will this help to make sure that internet traffic from BOINC and ATLAS VMs is going to be less and the efficiency of the tasks will rise due to lower download times? If that is the case, I have some questions: - Which proxy software do you recommend? Squid? - How do you configurate the proxy in order to get the best performance for BOINC (i.e. ATLAS) (is sudo apt-get install squid3 on, for example, debian enough)? - How big is the benefit in lowering the internet traffic and in increasing the efficiency (if there is one)? I tried to set up and use a squid proxy server, and the task shows in its logs that it is using it: 2017-03-20 17:14:15 (4408): Guest Log: Copied input files into RunAtlas. But I did not notice any difference in running time or efficiency for a couple of tasks using the proxy compared to tasks without proxy. So I am wondering if I have configurate the proxy server in a wrong way for ATLAS or the benefit (if there is one) is so small that it is almost unnoticeable. |
Send message Joined: 18 Dec 16 Posts: 123 Credit: 37,495,365 RAC: 0 |
or the benefit (if there is one) is so small that it is almost unnoticeable.This this post where computezrmle mentionned the expected benefit of using a local squid proxy: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4146&postid=29287#29287 I have not tried myself, but will try when time permits. We are the product of random evolution. |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
Well , if i have understood the docs shared by David Cameron. The multi-core wu seems to have three main phases during the running. 1° The beginning where lots of communications are transmitted between server and client and where the client delays events operations to optimize the sharing of process memory inside the vm. -------------------------------------------------------------------------------- 2° The intermediate running where events are treated by a particular core inside the vms,independently of other cores.(During this phase all the cores are used:-->Max efficiency) -------------------------------------------------------------------------------- 3° The ending,where output results of each core are merged together as soon as they are ended seperately.(but not in the same time --> loss of efficiency) The influence of the proxy squid may only be focused on the beginning phase during the initializing communication.I don't see elsewhere it may improve the situation. To increase the wu efficiency, it's possible to increase the number of events treated (to lenghten the second phase and reduce the time of the 1° and 3°phase,less efficient) But why not stop the multi-core vm when the number of events running is below the number of core used by the vm (then the idle times of cores which ends their events in first position are cut).Thus these few events not treated might be treated in another wu and so on. Is it possible ? Is it worth? |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 222,956,149 RAC: 136,964 |
Sorry for the late response. I was on vacation last week. Regarding the use of a proxy I wrote several comments in different CERN message boards. In general a user with a slow internet connection and a high number of hosts would have the highest benefit. Users with only 1 host but a slow internet connection would still see a significant speedup. My squid serves 2 crunching hosts with ATLAS and CMS (beside non CERN projects). Numbers vary but are typically inside the following ranges: Requests per day: 80000-150000 (2300 non CERN) Request hits: 90-95% Byte hits: 40-60% I recommend a setup with squid (version 3.5) combined with a set of iptables rules and a special routing table to enable policy based routing (PBR). Therefore I recommend linux as base OS. If my last information is correct PBR is not easy - if not impossible - to set up on windows. A workaround could be to run a linux box as standard gateway. Windows experts may have better proposals. In my LAN a reactivated laptop with 2 GB RAM and a 2core CPU does the job. 1 GB RAM would be enough as squid needs not more than 128 MB cache RAM. Benefits are (once the data is cached): - buffered downloads of the .vdi if there is more than one host or after project resets - shorter initialisation phase - buffered downloads during calculation phase (this amount is surprisingly high) |
Send message Joined: 2 Sep 04 Posts: 453 Credit: 193,369,412 RAC: 10,065 |
When I started with Atlas I set up a squid proxy on Linux and routed all traffic through this one. But I never was successful that files got buffered / cashed. All PC (up to 10) downloaded their own files and even the VDIs came never from Squid-Cash. So, after some time, I decided to switch off the proxy again. Supporting BOINC, a great concept ! |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 222,956,149 RAC: 136,964 |
During my vacation I set ATLAS and CMS to NNT to avoid huge data transfers if the WUs throw errors. After my return I noticed that ATLAS is now non beta at lhcathome. So I reset the project on both hosts to get a clear restart. Here are the lines from my squid logfile: "GET http://lhcathomeclassic.cern.ch/sixtrack/download/ATLASM_2017_03_01.vdi.gz HTTP/1.1" 200 709837529 "-" "BOINC client (x86_64-pc-linux-gnu 7.6.31)" TCP_MISS:HIER_DIRECT Comments: - first ATLAS vdi was requested from the original server - first CMS vdi was requested from the original server as the cached file was expired long ago and therefore not present - second download was taken from the cache in both cases - file size differs slightly due to my log configuration (it´s not an error) And here are some "big dog" examples from this afternoon. All of them were taken from the local cache: "GET http://cvmfs-stratum-one.cern.ch/cvmfs/atlas.cern.ch/data/e2/4f258a36360f3a07323861b6a6dcfd0a1bf7e0C HTTP/1.1" 200 13802975 "-" "cvmfs Fuse 2.2.0 cde1ef90-c9c9-4e9e-9c99-e94dd884ad14" TCP_HIT:HIER_NONE |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
Thanks for the information and sorry for the late response. To be honest I´m not really sure if the proxy setting works from inside an ATLAS VM as I redirect all HTTP traffic from the VM to CERN through my proxy using a set of netfilter rules. Is this the only reason why you use the approach with policy based rounting or are there other benefits? How do you manage to only redirect the VM traffic to the proxy? Do you think a Banana Pi (similar to a raspberry pi) is powerful enough to do the job as proxy server without losing too much benefit because of the hardware? Are you using iptables and iproute2? |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 222,956,149 RAC: 136,964 |
Is this the only reason why you use the approach with policy based rounting or are there other benefits? I use policy based routing mainly for ATLAS and CMS VMs although ATLAS is (now) able to read BOINC´s proxy setting. How do you manage to only redirect the VM traffic to the proxy? I mark relevant packets (from user: boinc; destination: CERN; to-port: 80, 3125, 3128) and send them via an additional routing table to my proxy instead of the standard gateway. The proxy is configured to handle normal traffic as well as intercepted traffic (extra port). Do you think a Banana Pi (similar to a raspberry pi) is powerful enough to do the job as proxy server without losing too much benefit because of the hardware? CPU: OK RAM: OK (128 MB cache is more than enough to serve thousands of the small ATLAS or CMS files) Disk: ?? I suggest to spend at least 10-30 GB for the big files Network: speed should fit to your LAN Configure squid to cache small files only in RAM and big files only on disk. Are you using iptables and iproute2? iptables, iproute2, conntrack-tools. It´s all included in my linux distribution |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,533,875 RAC: 0 |
thanks for your help! |
©2024 CERN