Message boards : ATLAS application : Atlas Native Transient HTTP Errors Uploading Resultfile
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 47004 - Posted: 12 Jul 2022, 6:57:36 UTC

Hi,
after quit some time of abstinenz form LHC I now tried to run Atlas Native inside a VirtualBox with Ubuntu 20.04.

All seemed to do fine, I could see the Athena-Tasks running, the WU has finished and tried to upload the results.

All files but one where uploaded succesfull:

Manni VL_CI01_31501

Started upload of tLNLDm26CV1nfZGDcpSWOuwoABFKDmABFKDmTzMbDmF51LDm3rzjam_0_r1453389463_ATLAS_result
Started upload of tuXMDmju8U1nfZGDcpSWOuwoABFKDmABFKDmTzMbDmxm1LDmNZfCSo_0_r1401751060_ATLAS_result
Project communication failed: attempting access to reference site
Temporarily failed upload of tLNLDm26CV1nfZGDcpSWOuwoABFKDmABFKDmTzMbDmF51LDm3rzjam_0_r1453389463_ATLAS_result: transient HTTP error
Backing off 03:38:09 on upload of tLNLDm26CV1nfZGDcpSWOuwoABFKDmABFKDmTzMbDmF51LDm3rzjam_0_r1453389463_ATLAS_result
Internet access OK - project servers may be temporarily down.
Temporarily failed upload of tuXMDmju8U1nfZGDcpSWOuwoABFKDmABFKDmTzMbDmxm1LDmNZfCSo_0_r1401751060_ATLAS_result: transient HTTP error
Backing off 03:58:24 on upload of tuXMDmju8U1nfZGDcpSWOuwoABFKDmABFKDmTzMbDmxm1LDmNZfCSo_0_r1401751060_ATLAS_result

So, who has a problem? The Upload-Server, my local Squid or my firewall, letting something not through ?


Supporting BOINC, a great concept !
ID: 47004 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,889,545
RAC: 138,335
Message 47005 - Posted: 12 Jul 2022, 7:11:40 UTC - in response to Message 47004.  

Hi Yeti,

All files ... where uploaded

Your Theory task reported a success.



... but one ...

The huge one.

Which Squid did you use?
The one you used in the past or a new one (on the Linux VM)?
In case it's the latter, is it version >4.x?
In that case please downgrade to the most recent 4.x.
ID: 47005 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 47006 - Posted: 12 Jul 2022, 7:22:05 UTC - in response to Message 47005.  

Your Theory task reported a success.
Yes, but it had only run 3 - 5 Minutes and I was not shure, if this is okay.


Which Squid did you use?
4.14


The one you used in the past or a new one (on the Linux VM)?
The "old" one, I'm running since you helped me setting it up years ago


Supporting BOINC, a great concept !
ID: 47006 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 47007 - Posted: 12 Jul 2022, 7:57:25 UTC

Got it !

My Firewall was blocking the Server-Response. Having switched this off, the upload worked immediatly

And in log file I found:

HITS file was successfully produced:

So, for me it looks as if all works fine now.


Supporting BOINC, a great concept !
ID: 47007 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,889,545
RAC: 138,335
Message 47008 - Posted: 12 Jul 2022, 7:58:55 UTC - in response to Message 47006.  

Your Theory task reported a success.


Yes, but it had only run 3 - 5 Minutes and I was not shure, if this is okay.

Theory native is not very verbose, but this line points out it was a success:
18:00:08 CEST +02:00 2022-07-11: cranky-0.0.32: [INFO] Container 'runc' finished with status code 0.




Regarding ATLAS:
Meanwhile there are some valid tasks.
Nonetheless, on your Linux VM please run the following commands and post the output:
cvmfs_config stat atlas
cvmfs_config showconfig -s atlas |grep -E 'CVMFS_(HTTP_PROXY|SERVER_URL|USE_CDN)'
ID: 47008 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 47009 - Posted: 12 Jul 2022, 8:05:32 UTC - in response to Message 47008.  

cvmfs_config stat atlas
root@mannivl:~# cvmfs_config stat atlas
VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE
2.9.4.0 88846 9 42784 106736 2 78 2489500 4096001 1564 130560 0 98473 99.975 16748 4563 http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch DIRECT 1


cvmfs_config showconfig -s atlas |grep -E 'CVMFS_(HTTP_PROXY|SERVER_URL|USE_CDN)'
root@mannivl:~# cvmfs_config showconfig -s atlas |grep -E 'CVMFS_(HTTP_PROXY|SERVER_URL|USE_CDN)'
CVMFS_HTTP_PROXY='auto;DIRECT' # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/domain.d/cern.ch.conf
CVMFS_SERVER_URL='http://s1cern-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1ral-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1bnl-cvmfs.openhtc.io/cvmfs/atlas.cern.ch;http://s1fnal-cvmfs.openhtc.io:8080/cvmfs/atlas.cern.ch;http://s1asgc-cvmfs.openhtc.io:8080/cvmfs/atlas.cern.ch;http://s1ihep-cvmfs.openhtc.io:8080/cvmfs/atlas.cern.ch;http://s1swinburne-cvmfs.openhtc.io:8080/cvmfs/atlas.cern.ch' # from /etc/cvmfs/domain.d/cern.ch.conf
CVMFS_USE_CDN=yes # from /etc/cvmfs/default.local


Supporting BOINC, a great concept !
ID: 47009 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,889,545
RAC: 138,335
Message 47010 - Posted: 12 Jul 2022, 8:42:22 UTC - in response to Message 47009.  

I suggest to replace
CVMFS_HTTP_PROXY='auto;DIRECT'
with
CVMFS_HTTP_PROXY="http://<name_or_IP_of_your_local_squid>:<port>;DIRECT"

in /etc/cvmfs/default.local


Then run "cvmfs_config reload" and check with "cvmfs_config stat" if "DIRECT" changes to the local proxy's IP.
ID: 47010 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 47013 - Posted: 12 Jul 2022, 12:24:50 UTC - in response to Message 47010.  

and check with "cvmfs_config stat" if "DIRECT" changes to the local proxy's IP.

Yes, it shows the local IP from my squid

Thanks a lot !


Supporting BOINC, a great concept !
ID: 47013 · Report as offensive     Reply Quote
Saturn911

Send message
Joined: 3 Nov 12
Posts: 36
Credit: 114,068,140
RAC: 87,651
Message 47285 - Posted: 23 Sep 2022, 21:54:39 UTC - in response to Message 47005.  


In case it's the latter, is it version >4.x?
In that case please downgrade to the most recent 4.x.


What's wrong with squid 5.x?
Gave it a try, but no uploads to ATLAS.
Had to switch back to 4.x :-(
ID: 47285 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 47441 - Posted: 30 Oct 2022, 9:10:26 UTC - in response to Message 47285.  

CMS and Theory-native have no problem with Squid 5.5.
ID: 47441 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,889,545
RAC: 138,335
Message 47442 - Posted: 31 Oct 2022, 7:35:55 UTC - in response to Message 47285.  

According to the Squid bugtracker Squid 5.x suffers from a bug that affects the POST method:
https://bugs.squid-cache.org/show_bug.cgi?id=5214

v4.x is not affected
v6.x may include a fix but is a development version
v5.x does not (yet) include a backport of the v6 fix


Theory dos not use the POST method => it works fine with Squid 5.x.

ATLAS returns the result file to the BOINC client which uploads it using POST.

CMS uploads subtask results directly from within the VM using PUT (which is also affected).


Hence, as of now ATLAS and CMS will not upload their results if
- a Squid 5.x is configured for BOINC (ATLAS)
- a Squid 5.x is used in a complex firewall environment that transparently forces HTTP traffic through a Squid 5.x (ATLAS and CMS)

Conclusion:
V4.x is still the recommended Squid version.
ID: 47442 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 47443 - Posted: 31 Oct 2022, 10:15:20 UTC - in response to Message 47442.  
Last modified: 31 Oct 2022, 10:35:55 UTC

2022-10-30 10:39:57 (4540): Guest Log: [INFO] Reloading and probing the CVMFS configuration
2022-10-30 10:40:03 (4540): Guest Log: [INFO] Probing /cvmfs/cvmfs-config.cern.ch... OK
2022-10-30 10:40:03 (4540): Guest Log: [INFO] Probing /cvmfs/grid.cern.ch... OK
2022-10-30 10:40:06 (4540): Guest Log: [INFO] Probing /cvmfs/cms.cern.ch... OK
2022-10-30 10:40:06 (4540): Guest Log: [INFO] Probing /cvmfs/singularity.opensciencegrid.org... OK
2022-10-30 10:40:06 (4540): Guest Log: [INFO] Probing /cvmfs/cms-ib.cern.ch... OK
2022-10-30 10:40:06 (4540): Guest Log: [INFO] Probing /cvmfs/oasis.opensciencegrid.org... OK
2022-10-30 10:40:07 (4540): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY
2022-10-30 10:40:07 (4540): Guest Log: [INFO] 2.7.2.0 http://s1cern-cvmfs.openhtc.io http://10.xxx.yyy.zz:3128
2022-10-30 10:40:07 (4540): Guest Log: [INFO] Reading volunteer information
2022-10-30 10:40:07 (4540): Guest Log: [INFO] Requesting an X509 credential from LHC@home
2022-10-30 10:40:08 (4540): Guest Log: [INFO] CMS application starting. Check log files.
......
2022-10-31 03:20:44 (4540): Status Report: Elapsed Time: '60000.000000'
2022-10-31 03:20:44 (4540): Status Report: CPU Time: '3246.156250'
2022-10-31 04:40:48 (4540): Powering off VM.
2022-10-31 04:40:49 (4540): Successfully stopped VM.
2022-10-31 04:40:49 (4540): Deregistering VM. (boinc_babf107dda7402da, slot#5)
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10548292
ID: 47443 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,889,545
RAC: 138,335
Message 47444 - Posted: 31 Oct 2022, 11:20:20 UTC - in response to Message 47443.  

The log snippet (obviously from a CMS task) just shows that the VM's CVMFS client accepts an HTTP proxy.
Any HTTP proxy would be shown like this, not only a Squid.
The CVMFS client also doesn't know whether it's a v3, v4, or v... Squid version.
CVMFS works fine even with a Squid v5 since it only makes downloads - it never uploads anything.
That's one reason why Theory works fine with a Squid v5.


It's the upload direction that is affected by the v5 bug.
CMS's stderr.txt logs never show such errors.
CMS tries to upload the result a couple of times before it gives up and starts a fresh subtask.
ID: 47444 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 47445 - Posted: 31 Oct 2022, 11:51:26 UTC - in response to Message 47444.  

This thread is better in folder number crunching - Squid.
Seeing no problems for CMS-Tasks uploading, as you are writing.
ID: 47445 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,889,545
RAC: 138,335
Message 47446 - Posted: 31 Oct 2022, 12:25:06 UTC - in response to Message 47445.  

I clearly pointed out under which conditions this can happen.
That information is useful for those who have a more complex LAN/Firewall than your's.
For them Squid v4 is still the better choice.

If you are not affected, ignore it.
ID: 47446 · Report as offensive     Reply Quote
Evangelos Katikos

Send message
Joined: 4 Oct 21
Posts: 10
Credit: 37,777,863
RAC: 179
Message 47447 - Posted: 1 Nov 2022, 3:25:15 UTC

I've used squid v5 from the start.

You can tell boinc to bypass squid for result uploads: boinc manager -> Options -> other options -> http proxy -> don't use proxy for -> http://lhcathome-upload.cern.ch/lhcathome_cgi/file_upload_handler
Or put it in cc_config. Squid is useless for result uploads anyway.

The workaround mentioned in the bug above works also. With client_request_buffer_max_size 512 MB atlas uploads go straight through or usually on their 1st retry.
ID: 47447 · Report as offensive     Reply Quote

Message boards : ATLAS application : Atlas Native Transient HTTP Errors Uploading Resultfile


©2024 CERN