Message boards : CMS Application : slow gfal-copy
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1515
Credit: 83,858,872
RAC: 82,664
Message 30148 - Posted: 2 May 2017, 7:21:51 UTC
Last modified: 2 May 2017, 7:47:00 UTC

Result upload via gfal-copy is extremely slow.
Not more than 10 % of normal speed although my internet connection is free.

<edit>
Console messages:
"Command timed out after 2400 seconds!" -> 48MB were uploaded

"*newErr is not NULL impossible to overwrite ...
old error wasHTTP 404 : File not found"
</edit>

<edit2>
The following upload was successful.
</edit2>
ID: 30148 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 702
Credit: 5,629,264
RAC: 3,869
Message 30149 - Posted: 2 May 2017, 7:59:00 UTC - in response to Message 30148.  

We've seen the "impossible to overwrite" message before when a job has successfully completed but for some reason Condor thinks it hasn't. It then requeues the job, and when re-run it finds the output already there and throws the overwrite message and deletes the file. Then the third try usually succeeds. I used to see this when we were running CRAB, with WMAgent I don't have access to the logs, unfortunately.
ID: 30149 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1515
Credit: 83,858,872
RAC: 82,664
Message 30150 - Posted: 2 May 2017, 8:22:24 UTC

What made me wonder was the slow upload rate.
48 MB were already uploaded within 2400 s before the timeout cancelled it.
That is around 20kB/s.
I usually see 600kB/s.

The connections to CERN (esp. condor) were established during the whole period and the 2nd slot finished a job and uploaded at normal speed.

What I did not noticed is if the upload was interrupted by BOINC, e.g. if the client paused the CMS VM for a short period to run another project.
ID: 30150 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 702
Credit: 5,629,264
RAC: 3,869
Message 30151 - Posted: 2 May 2017, 9:52:19 UTC - in response to Message 30150.  

There were some problems with the CMS computing infrastructure yesterday but I don't think that would have affected us -- I could be wrong. Of course yesterday was a holiday, so it was not fixed immediately.
ID: 30151 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1515
Credit: 83,858,872
RAC: 82,664
Message 33806 - Posted: 12 Jan 2018, 15:03:16 UTC

Hi Ivan,

I just noticed some unusual gfal-copy activities in a VM instead of it's regular end (it's not yet finished).
In addition the job overview page shows "something" that could be the beginning of a dip.

Is it false alarm or do the servers need a kick?
ID: 33806 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1515
Credit: 83,858,872
RAC: 82,664
Message 33808 - Posted: 12 Jan 2018, 15:39:25 UTC - in response to Message 33806.  

... Is it false alarm or do the servers need a kick?

It finally finished.
Maybe just a hiccup.
ID: 33808 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 702
Credit: 5,629,264
RAC: 3,869
Message 33812 - Posted: 12 Jan 2018, 16:55:46 UTC - in response to Message 33808.  

Yes, I see nothing overtly alarming. You may have also noticed that since last night we have switched to a different proxy. We'll probably be switching back and forth for a little while while we look at optimising start-up times.
ID: 33812 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1515
Credit: 83,858,872
RAC: 82,664
Message 33815 - Posted: 12 Jan 2018, 17:19:16 UTC - in response to Message 33812.  

...You may have also noticed that since last night we have switched to a different proxy.

You mean s1x-cvmfs.openhtc.io?
They work like a charme since my hosts run v47.80.
Fast and reliable, nothing to complain, neither DIRECT nor as parents of my local squid.
:-)

Cheers
ID: 33815 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 702
Credit: 5,629,264
RAC: 3,869
Message 33819 - Posted: 12 Jan 2018, 20:41:03 UTC - in response to Message 33815.  
Last modified: 12 Jan 2018, 20:43:57 UTC

...You may have also noticed that since last night we have switched to a different proxy.

You mean s1x-cvmfs.openhtc.io?
They work like a charme since my hosts run v47.80.
Fast and reliable, nothing to complain, neither DIRECT nor as parents of my local squid.
:-)
Cheers

Yes, that seems to be the badger. I got this message yesterday:
[Adding Ivan -- we're talking about commercial (although free) caching, see http://openhtc.io, so far for LHC@Home CMS CVMFS usage and for U.S. CMS Opportunistic usage]
followed by
 Ivan, could you please update that configuration?  Change the line
    <proxy url="http://lhchomeproxy.cern.ch:3125"/>
to the two lines
    <proxyconfig url="http://lhchomeproxy.cern.ch/wpad.dat"/>
    <proxyconfig url="http://lhchomeproxy.fnal.gov/wpad.dat"/>

As I said earlier, we'll probably be changing between the two a few times to establish an optimum. There was a suggestion that I submit considerably shorter jobs for the study, but Laurence thinks he can extract the start-up times from existing log files.
ID: 33819 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1515
Credit: 83,858,872
RAC: 82,664
Message 33820 - Posted: 12 Jan 2018, 22:25:48 UTC - in response to Message 33819.  

... Laurence thinks he can extract the start-up times from existing log files.

If you look at the statistics the following numbers could be interesting.

Timeframe: 2018-01-12 0:00 until 2018-01-12 22:48

Total request from CMS VMs to s1x-cvmfs.openhtc.io (forced to be checked by my local squid):
10520 (roughly 1.6 GB)

Requests forwarded external:
1485 (byte count not calculated by default but available in the logs)

Request efficiency of the local cache:
>85 %

This causes a relevant boost regarding the startup time of a VM.
If other volunteers also use local proxies it could have an impact on the overall timing statistics.
ID: 33820 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 702
Credit: 5,629,264
RAC: 3,869
Message 33832 - Posted: 13 Jan 2018, 15:55:35 UTC - in response to Message 33820.  

Thanks, that's good to know. How to get the message across to the rank-and-file is another matter... :-/
ID: 33832 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1515
Credit: 83,858,872
RAC: 82,664
Message 39601 - Posted: 13 Aug 2019, 10:40:45 UTC

CMS uploads are currently very slow - only 10-12 % of normal upload speed.
As ATLAS uploads use 100 % it's most likely not caused by something on my side.
ID: 39601 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1295
Credit: 23,371,729
RAC: 5,033
Message 39602 - Posted: 13 Aug 2019, 11:35:34 UTC - in response to Message 39601.  

CMS uploads are currently very slow - only 10-12 % of normal upload speed.
same here
ID: 39602 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1515
Credit: 83,858,872
RAC: 82,664
Message 39608 - Posted: 14 Aug 2019, 13:16:05 UTC

Today my CMS uploads run at 100 % speed.
:-)
ID: 39608 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1295
Credit: 23,371,729
RAC: 5,033
Message 39609 - Posted: 14 Aug 2019, 16:16:52 UTC - in response to Message 39608.  

Today my CMS uploads run at 100 % speed.
here too :-)
ID: 39609 · Report as offensive     Reply Quote

Message boards : CMS Application : slow gfal-copy


©2020 CERN