Message boards :
LHCb Application :
Zero CPU usage
Message board moderation
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,543 RAC: 30,793 |
during the past hours, I have noticed that all LHCb tasks running on 2 of my PCs show zero CPU usage in the Windows task manager. Going to the "task properties" in the BOINC Manager, a huge difference is shown between total runtime and CPU time. What's going on at LHC? |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,543 RAC: 30,793 |
anyone any idea what's going on? Some of my LHCb tasks have been running now for more than 12 hours, CPU time about 7 hours :-( |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I just had to abort an LHCb after 21 hours. It was showing only 12% CPU usage, and not much progress. I don't recall that happening before, at least not for a long while. |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,873,875 RAC: 38,811 |
Did you check if that VM was busy with any uploads? See: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4675&postid=34919 |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Did you check if that VM was busy with any uploads? No. Such a low CPU usage made me very suspicious that it was all bad. Here is all that is left that I know of: https://lhcathome.cern.ch/lhcathome/result.php?resultid=187209230 |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,543 RAC: 30,793 |
since this morning, I am noticing exactly the problem described above. All my running LHCb tasks show very little CPU time compared to the total runtine (seen when clicking on the "Properties" tab on the left side in the BOINC Manager). What's going on? Should I abort all these tasks? Are they faulty? |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
I just had two LHCb and two Theory error by themselves after using very little CPU usage (6 to 12 percent). They ran from 3 to 5 hours. It seems to be a communication failure with CERN: VM Completion Message: Could not connect to cern.ch on port 80 https://lhcathome.cern.ch/lhcathome/results.php?hostid=10544450&offset=0&show_names=0&state=6&appid= |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,470,586 RAC: 3,147 |
What's going on? Should I abort all these tasks? Are they faulty?Not running LHCb normally, but tested and found no problems on my side, however the CPU-load is 44% during the elapsed time with 100% allowed. 14:26:45 +0200 2018-06-08 [INFO] New Job Starting in slot1 14:26:45 +0200 2018-06-08 [INFO] Condor JobID: 257355.882 in slot1 14:26:50 +0200 2018-06-08 [INFO] Starting pilot in slot1 14:47:02 +0200 2018-06-08 [INFO] Job finished in slot1 with . 14:47:05 +0200 2018-06-08 [INFO] New Job Starting in slot1 14:47:05 +0200 2018-06-08 [INFO] Condor JobID: 257355.1089 in slot1 14:47:10 +0200 2018-06-08 [INFO] Starting pilot in slot1 15:07:00 +0200 2018-06-08 [INFO] Job finished in slot1 with . 15:07:27 +0200 2018-06-08 [INFO] New Job Starting in slot1 15:07:27 +0200 2018-06-08 [INFO] Condor JobID: 257355.1336 in slot1 15:07:32 +0200 2018-06-08 [INFO] Starting pilot in slot1 15:27:25 +0200 2018-06-08 [INFO] Job finished in slot1 with . 15:27:27 +0200 2018-06-08 [INFO] New Job Starting in slot1 15:27:27 +0200 2018-06-08 [INFO] Condor JobID: 257359.74 in slot1 15:27:32 +0200 2018-06-08 [INFO] Starting pilot in slot1 16:25:33 +0200 2018-06-08 [INFO] Job finished in slot1 with . 16:25:36 +0200 2018-06-08 [INFO] New Job Starting in slot1 16:25:36 +0200 2018-06-08 [INFO] Condor JobID: 257359.830 in slot1 16:25:41 +0200 2018-06-08 [INFO] Starting pilot in slot1 16:45:57 +0200 2018-06-08 [INFO] Job finished in slot1 with . 16:46:07 +0200 2018-06-08 [INFO] New Job Starting in slot1 16:46:07 +0200 2018-06-08 [INFO] Condor JobID: 257355.1448 in slot1 16:46:13 +0200 2018-06-08 [INFO] Starting pilot in slot1 17:06:18 +0200 2018-06-08 [INFO] Job finished in slot1 with . 17:06:26 +0200 2018-06-08 [INFO] New Job Starting in slot1 17:06:26 +0200 2018-06-08 [INFO] Condor JobID: 257367.274 in slot1 17:06:31 +0200 2018-06-08 [INFO] Starting pilot in slot1 17:26:33 +0200 2018-06-08 [INFO] Job finished in slot1 with . 17:26:36 +0200 2018-06-08 [INFO] New Job Starting in slot1 17:26:36 +0200 2018-06-08 [INFO] Condor JobID: 257367.670 in slot1 17:26:41 +0200 2018-06-08 [INFO] Starting pilot in slot1 17:46:33 +0200 2018-06-08 [INFO] Job finished in slot1 with . 17:46:35 +0200 2018-06-08 [INFO] New Job Starting in slot1 17:46:35 +0200 2018-06-08 [INFO] Condor JobID: 257367.1101 in slot1 17:46:40 +0200 2018-06-08 [INFO] Starting pilot in slot1 18:06:39 +0200 2018-06-08 [INFO] Job finished in slot1 with . 18:06:41 +0200 2018-06-08 [INFO] New Job Starting in slot1 18:06:41 +0200 2018-06-08 [INFO] Condor JobID: 257372.171 in slot1 18:06:47 +0200 2018-06-08 [INFO] Starting pilot in slot1 |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
Not running LHCb normally, but tested and found no problems on my side, It may depend on their length. The short ones (15 minutes) on my i7-4790 running Ubuntu 16.04 take about 13% CPU, whereas the long ones (13 hours) take about 70 to 90%. |
Send message Joined: 14 Jan 10 Posts: 1418 Credit: 9,470,586 RAC: 3,147 |
Not running LHCb normally, but tested and found no problems on my side, Yeah, the overall CPU-usage went down and down overtime, probably running shorter ones. This was my test task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=198450846 |
Send message Joined: 15 Jun 08 Posts: 2534 Credit: 253,873,875 RAC: 38,811 |
Sorry to disappoint you. LHCb tasks are designed to run 12h+ (+ = time to finish the currently running job + 10 minutes shutdown delay). Efficient tasks look like this (except the credit reward, which is very poor): https://lhcathome.cern.ch/lhcathome/result.php?resultid=197953503 https://lhcathome.cern.ch/lhcathome/result.php?resultid=197831195 https://lhcathome.cern.ch/lhcathome/result.php?resultid=197893787 Tasks like the following ones point out a server side problem: https://lhcathome.cern.ch/lhcathome/result.php?resultid=198396085 https://lhcathome.cern.ch/lhcathome/result.php?resultid=198376375 Unfortunately it can't be seen in the logs as the real problem (the idle time) occurs either before "Job finished in slot1 with ..." or after "New Job Starting in slot...". Tasks that run 15-20 minutes seem to run only 2 dummy jobs. Well, they reward the volunteer's effort, but I doubt they return useful scientific output. |
Send message Joined: 24 Oct 04 Posts: 1173 Credit: 54,831,657 RAC: 16,126 |
Yes I had some of these last week so I stopped running them and just went back to Theory tasks since it was even worse with CMS last run. https://lhcathome.cern.ch/lhcathome/results.php?userid=5472&offset=0&show_names=0&state=0&appid=12 Volunteer Mad Scientist For Life |
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 |
They seem to come in waves. I haven't had a short one for the past couple of days, and it is not a large loss of time in any case. It is not at the CMS level at any rate (nothing is). |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
Just for information : a plot over a sliding period of one week which displays the average cpu efficiency for all boinc volunteers with LHCb work units: Apparently ,it's not possible to separate by lengh time of tasks to display results. |
Send message Joined: 30 Aug 14 Posts: 145 Credit: 10,847,070 RAC: 0 |
Hi! I have two (low power) machines running LHCb tasks. One of these hosts (ID: 10552164) shows a very low cpu usage compared to the runtime of a task. It only shows between 25 - 40 % cpu time. The other host has a "normal" cpu/runtime proportion. What could be the reason for that? Greetings, djoser. Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us |
Send message Joined: 18 Dec 15 Posts: 1814 Credit: 118,498,543 RAC: 30,793 |
It only shows between 25 - 40 % cpu time.the experience I've made with LHCb is that these tasks are extremely volatile what concerns CPU usage. |
©2024 CERN