Message boards :
LHCb Application :
LHCb VMs have longer runtimes
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,873,267 RAC: 38,830 |
Since yesterday evening the average runtimes of LHCb VMs are much longer than the weeks before. Do we crunch another type of jobs or is it a result of the server works? |
Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 |
Since yesterday evening the average runtimes of LHCb VMs are much longer than the weeks before. I switched one machine to LHCb and 50 of the 75 WU's were typically 2000-4000 seconds run time and paid about 3 credit each. A lot of bandwidth used for such short runs. They did two CONDOR jobs in that 2000 seconds. For example: Condor JobID: 24211.257 Condor JobID: 24211.462 were completed. The 25 of 75 WU's that survived over 15,000 seconds (usually ~43,000 seconds) completed between 20 and 60 CONDOR jobs. (This was after I got the issues with the WiFi router resolved) Are your extremely long run times WU's performing 80, 120, 150 CONDOR jobs? Is the WU's algorithm for deciding the number of CONDOR jobs before shutting down the VM not working correctly in all environments? |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,873,267 RAC: 38,830 |
The majority of LHCb WUs run only 2 short subjobs and then shut down (runtimes: 950-1300 s). This results in a CPU efficiency of less than 25% but lots of network traffic to setup the VM, in my eyes a waste of resources. As this situation did not change for months and nobody from the project team seems to be interested to explain it, I run LHCb only occasionally to see if there are any changes. The longer runtimes I mentioned in my post below were only during a short period some weeks ago. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Are your extremely long run times WU's performing 80, 120, 150 CONDOR jobs? I don't know about the number, but on 10 February I got one LHCb that ran for 4 hours with 89% CPU usage, and another one that ran for 2 hours with 84% usage. There was also one on that date that ran for 1 hour, but only 25% CPU usage. All the others are even shorter, around 18 to 25 minutes, with low CPU usage in the range of 25 to 30% on my machine (Ryzen 1700 on Lubuntu 17.10.1). So it seems that the CPU usage is poor until you get up to some minimum of greater than one hour, whatever it is. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,873,267 RAC: 38,830 |
Are your extremely long run times WU's performing 80, 120, 150 CONDOR jobs? Count them: grep -c 'Job finished in slot' stderr.txt ;-) |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,107 RAC: 30,817 |
A lot of bandwidth used for such short runs.while CMS was down over last weekend, I switched to LHCb for several days. On one of my machines, 8 tasks were running concurrently, resulting in a bandwith usage of roughly 60GB / 24 hours. For me no problem, having a fast flatrate. For some others though, this huge usage may not be so nice. |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Count them: Whatever slot it was in appears to be long gone. |
©2024 CERN