Message boards : LHCb Application : LHCb app transfers data while WUs are running?
Message board moderation

To post messages, you must log in.

AuthorMessage
Pavel Hanak

Send message
Joined: 5 Mar 06
Posts: 13
Credit: 30,870,563
RAC: 10
Message 29636 - Posted: 25 Mar 2017, 17:31:30 UTC

Hi all, I've started to crunch WUs for LHCb app recently and I've noticed that some of them generate a lot of Internet traffic while they're running. I mostly notice it when they're uploading something, because I have rather slow upload on one of my machines. It seems that address 128.142.142.167 is used most often, but at least dozen other addresses appear in my logs, too. Some WUs transfer dozens of megabytes like that, others need much less. Is that normal? If so, I'm actually curious what exactly are the WUs downloading or uploading. Is it some auxilliary data needed for the simulations?
ID: 29636 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1274
Credit: 8,480,242
RAC: 2,028
Message 29638 - Posted: 25 Mar 2017, 20:58:51 UTC - in response to Message 29636.  

1 LHCb-task runs several jobs within the task lifetime. Tasklifetme between 12 and 36 hours.
I think at the moment 1 job duration is about 2 hours. For that job the VM will download and upload data.
Maybe you have even multicore-tasks enabled, but your machines are hidden.
ID: 29638 · Report as offensive     Reply Quote
Pavel Hanak

Send message
Joined: 5 Mar 06
Posts: 13
Credit: 30,870,563
RAC: 10
Message 29639 - Posted: 25 Mar 2017, 21:58:43 UTC

I see. What happens when several LHCb apps try to download/upload data, but it takes very long due to ISP speed limits? I have one machine on ADSL with lousy 50 KB/s upload, but that machine routinely runs 8 LHCb apps at once (it has 24 GB RAM). Sometimes they upload for 30 minutes or more. I suspect the slow transfer speeds may be responsible for some WUs failing. Are there some timeouts in the app which may cause such failures?

BTW, I haven't seen any multicore LHCb WUs on my machines yet, I didn't even know they exist. But my machines have already completed a few multicore ATLAS ones (I used to crunch ATLAS even before the merger with LHC).
ID: 29639 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1274
Credit: 8,480,242
RAC: 2,028
Message 29640 - Posted: 25 Mar 2017, 22:22:44 UTC - in response to Message 29639.  

A lot of questions about job behavior when network is not stable.
I can't answer them. Maybe LHCb project scientists. I cannot imagine that low network speed alone makes a job fail.
From sub project CMS, I know that jobs can survive connect interruption of at least 1 hour by re-establishing the connection.
ID: 29640 · Report as offensive     Reply Quote

Message boards : LHCb Application : LHCb app transfers data while WUs are running?


©2024 CERN