Message boards : ATLAS application : Download sometime between 20 and 50 kBps
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
hadron

Send message
Joined: 4 Sep 22
Posts: 58
Credit: 8,576,070
RAC: 14,221
Message 49382 - Posted: 3 Feb 2024, 22:48:50 UTC - in response to Message 49371.  

The problem most certainly does not lie outside CERN.

Just a statement. Not a valid evidence either to blame CERN.
The relevant point is that a speed drop can happen anywhere between the connection endpoints.

It doesn't strike you as odd that the problem occurs _only_ with ATLAS tasks?
ID: 49382 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2121
Credit: 159,926,969
RAC: 60,346
Message 49386 - Posted: 4 Feb 2024, 7:25:54 UTC - in response to Message 49382.  

"Messzeitpunkt" ; "Download (Mbit/s)" ; "Upload (Mbit/s)" ; "Laufzeit (ms)" ; "Uhrzeit"
"04.02.2024" ; "244,36" ; "73,24" ; "36" ; "08:21:08"
ID: 49386 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,615,748
RAC: 127,036
Message 49389 - Posted: 4 Feb 2024, 9:19:46 UTC

Downloaded ATLAS*.vdi.gz (1.7 GB) and CMS*.vdi.gz (1.6 GB) from lhcathome-upload.cern.ch to measure the download speed.
Since lhcathome-upload.cern.ch runs on a couple of boxes (guess how many) I tested each of them.
The FQDN is also used for ATLAS EVNT files.

Results:
min average: 1.15 MB/s
max average: 1.69 MB/s

This is less than my internet connection allows but far more than the 20-50 kB/s mentioned in the title.
The speed distribution shown in the monitoring system suggests the server is loaded but not heavily overloaded.


Now let's look at the method a well known volunteer used a couple of times to solve the problem:
Ok, when such a speed seen, disconnecting networkcard, waiting a minute and reconnect it.
.
.
.
Don't know where it come from, the most download-files have a beginning speed from 20-50 kb/s.
After disconnecting networkcard and activate again, there are mostly 9.000 kBit/s.
.
.
.
When seeing this low speed, making a disconnect of the networkcard and activate again.

These network resets cleared locked connections and freed resources on the computer's network stack and usually within the same network segment, here between the computer and the local router.

Be aware, nobody at CERN cleared the server's network stack at the same time.
It depends on the type of the reset whether the server gets immediately aware of it or after a timeout.
In the latter case the server still keeps some resources reserved for the lost connection.
Nonetheless the server obviously has enough resources to accept a new connection.


Why does it happen only for ATLAS?
Let's be more precise:
It is visible here since each ATLAS task downloads a huge file.
Few people would complain about occasional delays while downloading very small files.
Those are mostly not visible.



And the conclusion?
As already said, just claiming "it's not on my side" is not a valid evidence.
Even the tests above are not a valid evidence but they may give some hints to extend the view.
ID: 49389 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2121
Credit: 159,926,969
RAC: 60,346
Message 49390 - Posted: 4 Feb 2024, 9:25:37 UTC - in response to Message 49386.  
Last modified: 4 Feb 2024, 9:39:58 UTC

"Messzeitpunkt" ; "Download (Mbit/s)" ; "Upload (Mbit/s)" ; "Laufzeit (ms)" ; "Uhrzeit"
"04.02.2024" ; "244,36" ; "73,24" ; "36" ; "08:21:08"

Short answer:
When three files from Atlas seen in download, one is always with this speed of 20 to 50 Kbp/s.
No problems in download with CMS or Theory, of course no problem with EinsteinatHome or WCG.
https://wlcg.web.cern.ch/using-wlcg/monitoring-visualisation
ID: 49390 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1288
Credit: 8,517,662
RAC: 2,328
Message 49391 - Posted: 4 Feb 2024, 15:24:19 UTC - in response to Message 49390.  

When three files from Atlas seen in download, one is always with this speed of 20 to 50 Kbp/s.
I tried to force a 'bad' connection to download 5 pool.root files of 400MBs each (13 tasks) concurrently - 13 pool.root files in the transfer queue.
Besides the normal speed reduce by bandwidth, no 'very' slow speeds occurred.

04 Feb 09:24:42	Scheduler request completed: got 8 new tasks
04 Feb 09:24:44	Started download of Dh3MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmhmDLDmDcFSOm_EVNT.37161661._000005.pool.root.1
04 Feb 09:24:44	Started download of Dh3MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmhmDLDmDcFSOm_input.tar.gz
04 Feb 09:24:46	Finished download of Dh3MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmhmDLDmDcFSOm_input.tar.gz
04 Feb 09:24:46	Started download of boinc_job_script.oY9kNl
04 Feb 09:24:47	Finished download of boinc_job_script.oY9kNl
04 Feb 09:24:47	Started download of p4yMDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmfmDLDmPfYNTm_EVNT.37161661._000001.pool.root.1
04 Feb 09:24:54	Scheduler request completed: got 5 new tasks
04 Feb 09:25:18	Finished download of Dh3MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmhmDLDmDcFSOm_EVNT.37161661._000005.pool.root.1
04 Feb 09:25:18	Started download of p4yMDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmfmDLDmPfYNTm_input.tar.gz
04 Feb 09:25:22	Finished download of p4yMDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmfmDLDmPfYNTm_input.tar.gz
04 Feb 09:25:22	Started download of boinc_job_script.AXEUiA
04 Feb 09:25:25	Finished download of boinc_job_script.AXEUiA
04 Feb 09:25:25	Started download of C67MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmjmDLDmSbjj0n_EVNT.37161661._000006.pool.root.1
04 Feb 09:25:37	Finished download of p4yMDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmfmDLDmPfYNTm_EVNT.37161661._000001.pool.root.1
04 Feb 09:25:37	Started download of C67MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmjmDLDmSbjj0n_input.tar.gz
04 Feb 09:25:38	Finished download of C67MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmjmDLDmSbjj0n_input.tar.gz
04 Feb 09:25:38	Started download of boinc_job_script.913J19
04 Feb 09:25:41	Finished download of boinc_job_script.913J19
04 Feb 09:25:41	Started download of hL6MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmimDLDmRAnzmm_EVNT.37161661._000006.pool.root.1
04 Feb 09:26:12	Finished download of hL6MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmimDLDmRAnzmm_EVNT.37161661._000006.pool.root.1
04 Feb 09:26:12	Started download of hL6MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmimDLDmRAnzmm_input.tar.gz
04 Feb 09:26:14	Finished download of hL6MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmimDLDmRAnzmm_input.tar.gz
04 Feb 09:26:14	Started download of boinc_job_script.80pPTP
04 Feb 09:26:18	Finished download of boinc_job_script.80pPTP
04 Feb 09:26:18	Started download of lFCNDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmlmDLDm8d0C8m_EVNT.37161661._000008.pool.root.1
04 Feb 09:26:19	Starting task OMxLDmkKTq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmxPDLDmLq8Jnn_3
04 Feb 09:26:22	Finished download of C67MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmjmDLDmSbjj0n_EVNT.37161661._000006.pool.root.1
04 Feb 09:26:22	Started download of lFCNDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmlmDLDm8d0C8m_input.tar.gz
04 Feb 09:26:24	Finished download of lFCNDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmlmDLDm8d0C8m_input.tar.gz
04 Feb 09:26:24	Started download of boinc_job_script.G9hOvK
04 Feb 09:26:25	Finished download of boinc_job_script.G9hOvK
04 Feb 09:26:25	Started download of xcvMDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmdmDLDmCQNlln_EVNT.37161628._000009.pool.root.1
04 Feb 09:26:50	Finished download of lFCNDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmlmDLDm8d0C8m_EVNT.37161661._000008.pool.root.1
04 Feb 09:26:50	Started download of xcvMDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmdmDLDmCQNlln_input.tar.gz
04 Feb 09:26:52	Finished download of xcvMDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmdmDLDmCQNlln_input.tar.gz
04 Feb 09:26:52	Started download of boinc_job_script.KAd0py
04 Feb 09:26:54	Finished download of boinc_job_script.KAd0py
04 Feb 09:26:54	Started download of bw0MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmgmDLDmfJBdnn_EVNT.37161661._000002.pool.root.1
04 Feb 09:27:15	Finished download of xcvMDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmdmDLDmCQNlln_EVNT.37161628._000009.pool.root.1
04 Feb 09:27:15	Started download of bw0MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmgmDLDmfJBdnn_input.tar.gz
04 Feb 09:27:18	Finished download of bw0MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmgmDLDmfJBdnn_input.tar.gz
04 Feb 09:27:18	Started download of boinc_job_script.6mCSjr
04 Feb 09:27:19	Finished download of boinc_job_script.6mCSjr
04 Feb 09:27:19	Started download of 1BpMDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmamDLDm4beQKm_EVNT.37161628._000006.pool.root.1
04 Feb 09:27:29	Finished download of bw0MDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmgmDLDmfJBdnn_EVNT.37161661._000002.pool.root.1
04 Feb 09:27:29	Started download of 1BpMDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmamDLDm4beQKm_input.tar.gz
04 Feb 09:27:31	Finished download of 1BpMDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmamDLDm4beQKm_input.tar.gz
04 Feb 09:27:31	Started download of boinc_job_script.AttrPR
04 Feb 09:27:32	Finished download of boinc_job_script.AttrPR
04 Feb 09:27:32	Started download of dJTLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm1IlLDmERSOZm_EVNT.37161789._000004.pool.root.1
04 Feb 09:28:09	Finished download of dJTLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm1IlLDmERSOZm_EVNT.37161789._000004.pool.root.1
04 Feb 09:28:09	Started download of dJTLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm1IlLDmERSOZm_input.tar.gz
04 Feb 09:28:11	Finished download of dJTLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm1IlLDmERSOZm_input.tar.gz
04 Feb 09:28:11	Started download of boinc_job_script.96tO9c
04 Feb 09:28:12	Finished download of boinc_job_script.96tO9c
04 Feb 09:28:12	Started download of 6vNLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDmzIlLDmuC9hVm_EVNT.37161789._000001.pool.root.1
04 Feb 09:28:29	Finished download of 1BpMDmTymq4n9Rq4apoT9bVoABFKDmABFKDm4AnaDmamDLDm4beQKm_EVNT.37161628._000006.pool.root.1
04 Feb 09:28:29	Started download of 6vNLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDmzIlLDmuC9hVm_input.tar.gz
04 Feb 09:28:30	Finished download of 6vNLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDmzIlLDmuC9hVm_input.tar.gz
04 Feb 09:28:30	Started download of boinc_job_script.pAmXH8
04 Feb 09:28:31	Finished download of boinc_job_script.pAmXH8
04 Feb 09:28:31	Started download of TVaLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm5IlLDmTRoQdn_EVNT.37161789._000009.pool.root.1
04 Feb 09:28:49	Finished download of 6vNLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDmzIlLDmuC9hVm_EVNT.37161789._000001.pool.root.1
04 Feb 09:28:49	Started download of TVaLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm5IlLDmTRoQdn_input.tar.gz
04 Feb 09:28:51	Finished download of TVaLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm5IlLDmTRoQdn_input.tar.gz
04 Feb 09:28:51	Started download of boinc_job_script.EoY6ej
04 Feb 09:28:52	Finished download of boinc_job_script.EoY6ej
04 Feb 09:28:52	Started download of EaXLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm3IlLDmqngKLo_EVNT.37161789._000007.pool.root.1
04 Feb 09:29:05	Finished download of TVaLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm5IlLDmTRoQdn_EVNT.37161789._000009.pool.root.1
04 Feb 09:29:05	Started download of EaXLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm3IlLDmqngKLo_input.tar.gz
04 Feb 09:29:06	Finished download of EaXLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm3IlLDmqngKLo_input.tar.gz
04 Feb 09:29:06	Started download of boinc_job_script.kEIELA
04 Feb 09:29:07	Finished download of boinc_job_script.kEIELA
04 Feb 09:29:07	Started download of L3bLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm6IlLDm0muJ2n_EVNT.37161789._000010.pool.root.1
04 Feb 09:29:39	Finished download of EaXLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm3IlLDmqngKLo_EVNT.37161789._000007.pool.root.1
04 Feb 09:29:39	Started download of L3bLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm6IlLDm0muJ2n_input.tar.gz
04 Feb 09:29:40	Finished download of L3bLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm6IlLDm0muJ2n_input.tar.gz
04 Feb 09:29:40	Started download of boinc_job_script.vtYVwu
04 Feb 09:29:41	Finished download of boinc_job_script.vtYVwu
04 Feb 09:29:46	Finished download of L3bLDmrymq4nsSi4apGgGQJmABFKDmABFKDm8QvSDm6IlLDm0muJ2n_EVNT.37161789._000010.pool.root.1
ID: 49391 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2121
Credit: 159,926,969
RAC: 60,346
Message 49392 - Posted: 4 Feb 2024, 16:07:11 UTC - in response to Message 49391.  

Thank you Crystal,
today made a Routerupdate from end of last month.
Hoping, in any way to find the reason.
Now there are no more Atlas to test it.
ID: 49392 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 58
Credit: 8,576,070
RAC: 14,221
Message 49396 - Posted: 5 Feb 2024, 2:30:53 UTC - in response to Message 49389.  

@computezrmle

I have enough download bandwidth for 35MB/s. When I see an ATLAS file coming in at 100KB/s at the same time as some non-BOINC download (say from Usenet or whatever) is coming in at 20-25MB/s, I know that there is nothing wrong with my end of the connections.
When 2 ATLAS tasks come in at the same time, and I watch one of them struggle to reach 100KB/s while the other comes in at 10MB/s, I know there is nothing wrong with my end of the connections.

This is not a problem at the user's end.
ID: 49396 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2425
Credit: 227,615,748
RAC: 127,036
Message 49399 - Posted: 5 Feb 2024, 13:27:39 UTC - in response to Message 49396.  

Take your statement and exchange sender/receiver to get:
"When CERN sends thousands of large files at full speed and only 2 recipients report occasional problems, then it's not a problem at the CERN end."

Do you see the problem?
Without deeper investigation both are not valid.

OK, I don't expect you to accept this point of view.
Instead I expect complaints about the "only 2".
Well, replace it with few more but compare that with up to 7.32 k jobs ATLAS was recently running concurrently via non-grid-BOINC as shown here.
ID: 49399 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2121
Credit: 159,926,969
RAC: 60,346
Message 49400 - Posted: 5 Feb 2024, 13:40:01 UTC

Don't want to answer for @hadron,
but steve, why is it so heavy to accept network faulties shown in Boincmanager.
We know that Windows Tasks for Atlas are not in the same way as Linux Tasks.
Ones again, steve, build a Win11pro PC and see how it is.
btw have Atlas running as -native, no problem.
ID: 49400 · Report as offensive     Reply Quote
hadron

Send message
Joined: 4 Sep 22
Posts: 58
Credit: 8,576,070
RAC: 14,221
Message 49401 - Posted: 5 Feb 2024, 21:40:49 UTC - in response to Message 49399.  

Take your statement and exchange sender/receiver to get:
"When CERN sends thousands of large files at full speed and only 2 recipients report occasional problems, then it's not a problem at the CERN end."

Do you see the problem?
Without deeper investigation both are not valid.

Perhaps only 2 of us a) have noticed a problem, b) are interested in trying to resolve it, or c) have even noticed it.
In fact, I only became interested in this thread when I noticed that the finger was always being pointed at the home user's system or router, or at his/her ISP. My own experience shows that not to be the case, but you seem quite intent on ignoring or dismissing my real-life experience in favour of your explanation.

OK, I don't expect you to accept this point of view.
Instead I expect complaints about the "only 2".
Well, replace it with few more but compare that with up to 7.32 k jobs ATLAS was recently running concurrently via non-grid-BOINC as shown here.

I will start looking for a problem on my end, once you adequately address my observation that, at the same time as a large ATLAS task is coming down in BOINC at 100 KB/s or less, I can simultaneously grab a file of the same size, or larger, at 20 to 30 MB/s.

Note that I really don't care much about what happens after all the files for one task are downloaded and the task is ready to be started. The problem I've seen does not lie there; rather, the problem is with the initial download of the files necessary to run the task -- and it occurs with ATLAS tasks alone, even when all other network operations on my system are functioning normally.
ID: 49401 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2121
Credit: 159,926,969
RAC: 60,346
Message 49402 - Posted: 6 Feb 2024, 6:01:11 UTC - in response to Message 49401.  

Found this two entries in cc_config.xml:
<max_file_xfers>N</max_file_xfers>
Maximum number of simultaneous file transfers (default 8).
<max_file_xfers_per_project>N</max_file_xfers_per_project>
Maximum number of simultaneous file transfers per project (default 2).
Have changed from two to three.
Waiting now for ATLAS Tasks.
ID: 49402 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2121
Credit: 159,926,969
RAC: 60,346
Message 49404 - Posted: 6 Feb 2024, 9:16:31 UTC - in response to Message 49402.  

Those two parameter are not possible to overwrite.
Only from CERN-IT.
atm had one Atlas with this low speed.
ID: 49404 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : ATLAS application : Download sometime between 20 and 50 kBps


©2024 CERN