Message boards :
LHCb Application :
LHCb app transfers data while WUs are running?
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Mar 06 Posts: 13 Credit: 30,959,975 RAC: 2,389 |
Hi all, I've started to crunch WUs for LHCb app recently and I've noticed that some of them generate a lot of Internet traffic while they're running. I mostly notice it when they're uploading something, because I have rather slow upload on one of my machines. It seems that address 128.142.142.167 is used most often, but at least dozen other addresses appear in my logs, too. Some WUs transfer dozens of megabytes like that, others need much less. Is that normal? If so, I'm actually curious what exactly are the WUs downloading or uploading. Is it some auxilliary data needed for the simulations? |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,266 |
1 LHCb-task runs several jobs within the task lifetime. Tasklifetme between 12 and 36 hours. I think at the moment 1 job duration is about 2 hours. For that job the VM will download and upload data. Maybe you have even multicore-tasks enabled, but your machines are hidden. |
Send message Joined: 5 Mar 06 Posts: 13 Credit: 30,959,975 RAC: 2,389 |
I see. What happens when several LHCb apps try to download/upload data, but it takes very long due to ISP speed limits? I have one machine on ADSL with lousy 50 KB/s upload, but that machine routinely runs 8 LHCb apps at once (it has 24 GB RAM). Sometimes they upload for 30 minutes or more. I suspect the slow transfer speeds may be responsible for some WUs failing. Are there some timeouts in the app which may cause such failures? BTW, I haven't seen any multicore LHCb WUs on my machines yet, I didn't even know they exist. But my machines have already completed a few multicore ATLAS ones (I used to crunch ATLAS even before the merger with LHC). |
Send message Joined: 14 Jan 10 Posts: 1422 Credit: 9,484,585 RAC: 1,266 |
A lot of questions about job behavior when network is not stable. I can't answer them. Maybe LHCb project scientists. I cannot imagine that low network speed alone makes a job fail. From sub project CMS, I know that jobs can survive connect interruption of at least 1 hour by re-establishing the connection. |
©2024 CERN