Message boards : LHCb Application : Zero CPU usage
Message board moderation

To post messages, you must log in.

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,491,159
RAC: 104,616
Message 34800 - Posted: 30 Mar 2018, 7:22:20 UTC

during the past hours, I have noticed that all LHCb tasks running on 2 of my PCs show zero CPU usage in the Windows task manager.
Going to the "task properties" in the BOINC Manager, a huge difference is shown between total runtime and CPU time.
What's going on at LHC?
ID: 34800 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,491,159
RAC: 104,616
Message 34801 - Posted: 30 Mar 2018, 8:47:23 UTC

anyone any idea what's going on?

Some of my LHCb tasks have been running now for more than 12 hours, CPU time about 7 hours :-(
ID: 34801 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 35002 - Posted: 13 Apr 2018, 20:58:29 UTC - in response to Message 34801.  

I just had to abort an LHCb after 21 hours. It was showing only 12% CPU usage, and not much progress. I don't recall that happening before, at least not for a long while.
ID: 35002 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,046,382
RAC: 136,764
Message 35003 - Posted: 13 Apr 2018, 21:42:02 UTC - in response to Message 35002.  

Did you check if that VM was busy with any uploads?
See:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4675&postid=34919
ID: 35003 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 35004 - Posted: 13 Apr 2018, 22:30:24 UTC - in response to Message 35003.  
Last modified: 13 Apr 2018, 22:31:43 UTC

Did you check if that VM was busy with any uploads?

No. Such a low CPU usage made me very suspicious that it was all bad.
Here is all that is left that I know of:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=187209230
ID: 35004 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,491,159
RAC: 104,616
Message 35450 - Posted: 8 Jun 2018, 10:46:34 UTC

since this morning, I am noticing exactly the problem described above.

All my running LHCb tasks show very little CPU time compared to the total runtine (seen when clicking on the "Properties" tab on the left side in the BOINC Manager).

What's going on? Should I abort all these tasks? Are they faulty?
ID: 35450 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 35452 - Posted: 8 Jun 2018, 12:24:14 UTC - in response to Message 35450.  

I just had two LHCb and two Theory error by themselves after using very little CPU usage (6 to 12 percent).
They ran from 3 to 5 hours.

It seems to be a communication failure with CERN:
VM Completion Message: Could not connect to cern.ch on port 80

https://lhcathome.cern.ch/lhcathome/results.php?hostid=10544450&offset=0&show_names=0&state=6&appid=
ID: 35452 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,433,416
RAC: 3,056
Message 35453 - Posted: 8 Jun 2018, 17:13:58 UTC - in response to Message 35450.  

What's going on? Should I abort all these tasks? Are they faulty?
Not running LHCb normally, but tested and found no problems on my side,
however the CPU-load is 44% during the elapsed time with 100% allowed.

14:26:45 +0200 2018-06-08 [INFO] New Job Starting in slot1
14:26:45 +0200 2018-06-08 [INFO] Condor JobID:  257355.882 in slot1
14:26:50 +0200 2018-06-08 [INFO] Starting pilot in slot1
14:47:02 +0200 2018-06-08 [INFO] Job finished in slot1 with .
14:47:05 +0200 2018-06-08 [INFO] New Job Starting in slot1
14:47:05 +0200 2018-06-08 [INFO] Condor JobID:  257355.1089 in slot1
14:47:10 +0200 2018-06-08 [INFO] Starting pilot in slot1
15:07:00 +0200 2018-06-08 [INFO] Job finished in slot1 with .
15:07:27 +0200 2018-06-08 [INFO] New Job Starting in slot1
15:07:27 +0200 2018-06-08 [INFO] Condor JobID:  257355.1336 in slot1
15:07:32 +0200 2018-06-08 [INFO] Starting pilot in slot1
15:27:25 +0200 2018-06-08 [INFO] Job finished in slot1 with .
15:27:27 +0200 2018-06-08 [INFO] New Job Starting in slot1
15:27:27 +0200 2018-06-08 [INFO] Condor JobID:  257359.74 in slot1
15:27:32 +0200 2018-06-08 [INFO] Starting pilot in slot1
16:25:33 +0200 2018-06-08 [INFO] Job finished in slot1 with .
16:25:36 +0200 2018-06-08 [INFO] New Job Starting in slot1
16:25:36 +0200 2018-06-08 [INFO] Condor JobID:  257359.830 in slot1
16:25:41 +0200 2018-06-08 [INFO] Starting pilot in slot1
16:45:57 +0200 2018-06-08 [INFO] Job finished in slot1 with .
16:46:07 +0200 2018-06-08 [INFO] New Job Starting in slot1
16:46:07 +0200 2018-06-08 [INFO] Condor JobID:  257355.1448 in slot1
16:46:13 +0200 2018-06-08 [INFO] Starting pilot in slot1
17:06:18 +0200 2018-06-08 [INFO] Job finished in slot1 with .
17:06:26 +0200 2018-06-08 [INFO] New Job Starting in slot1
17:06:26 +0200 2018-06-08 [INFO] Condor JobID:  257367.274 in slot1
17:06:31 +0200 2018-06-08 [INFO] Starting pilot in slot1
17:26:33 +0200 2018-06-08 [INFO] Job finished in slot1 with .
17:26:36 +0200 2018-06-08 [INFO] New Job Starting in slot1
17:26:36 +0200 2018-06-08 [INFO] Condor JobID:  257367.670 in slot1
17:26:41 +0200 2018-06-08 [INFO] Starting pilot in slot1
17:46:33 +0200 2018-06-08 [INFO] Job finished in slot1 with .
17:46:35 +0200 2018-06-08 [INFO] New Job Starting in slot1
17:46:35 +0200 2018-06-08 [INFO] Condor JobID:  257367.1101 in slot1
17:46:40 +0200 2018-06-08 [INFO] Starting pilot in slot1
18:06:39 +0200 2018-06-08 [INFO] Job finished in slot1 with .
18:06:41 +0200 2018-06-08 [INFO] New Job Starting in slot1
18:06:41 +0200 2018-06-08 [INFO] Condor JobID:  257372.171 in slot1
18:06:47 +0200 2018-06-08 [INFO] Starting pilot in slot1
ID: 35453 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 35454 - Posted: 8 Jun 2018, 19:14:57 UTC - in response to Message 35453.  

Not running LHCb normally, but tested and found no problems on my side,
however the CPU-load is 44% during the elapsed time with 100% allowed.

It may depend on their length. The short ones (15 minutes) on my i7-4790 running Ubuntu 16.04 take about 13% CPU, whereas the long ones (13 hours) take about 70 to 90%.
ID: 35454 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,433,416
RAC: 3,056
Message 35455 - Posted: 8 Jun 2018, 19:36:20 UTC - in response to Message 35454.  

Not running LHCb normally, but tested and found no problems on my side,
however the CPU-load is 44% during the elapsed time with 100% allowed.

It may depend on their length. The short ones (15 minutes) on my i7-4790 running Ubuntu 16.04 take about 13% CPU, whereas the long ones (13 hours) take about 70 to 90%.

Yeah, the overall CPU-usage went down and down overtime, probably running shorter ones.
This was my test task: https://lhcathome.cern.ch/lhcathome/result.php?resultid=198450846
ID: 35455 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 223,046,382
RAC: 136,764
Message 35456 - Posted: 8 Jun 2018, 19:56:28 UTC

Sorry to disappoint you.

LHCb tasks are designed to run 12h+ (+ = time to finish the currently running job + 10 minutes shutdown delay).

Efficient tasks look like this (except the credit reward, which is very poor):
https://lhcathome.cern.ch/lhcathome/result.php?resultid=197953503
https://lhcathome.cern.ch/lhcathome/result.php?resultid=197831195
https://lhcathome.cern.ch/lhcathome/result.php?resultid=197893787

Tasks like the following ones point out a server side problem:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=198396085
https://lhcathome.cern.ch/lhcathome/result.php?resultid=198376375
Unfortunately it can't be seen in the logs as the real problem (the idle time) occurs either before "Job finished in slot1 with ..." or after "New Job Starting in slot...".

Tasks that run 15-20 minutes seem to run only 2 dummy jobs.
Well, they reward the volunteer's effort, but I doubt they return useful scientific output.
ID: 35456 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,504,665
RAC: 3,862
Message 35458 - Posted: 8 Jun 2018, 20:56:11 UTC

Yes I had some of these last week so I stopped running them and just went back to Theory tasks since it was even worse with CMS last run.

https://lhcathome.cern.ch/lhcathome/results.php?userid=5472&offset=0&show_names=0&state=0&appid=12
Volunteer Mad Scientist For Life
ID: 35458 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 35459 - Posted: 8 Jun 2018, 21:49:36 UTC - in response to Message 35458.  
Last modified: 8 Jun 2018, 22:15:15 UTC

They seem to come in waves. I haven't had a short one for the past couple of days, and it is not a large loss of time in any case. It is not at the CMS level at any rate (nothing is).
ID: 35459 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 35464 - Posted: 9 Jun 2018, 17:28:50 UTC - in response to Message 35459.  
Last modified: 9 Jun 2018, 17:38:16 UTC

Just for information :

a plot over a sliding period of one week which displays the average cpu efficiency for all boinc volunteers with LHCb work units:



Apparently ,it's not possible to separate by lengh time of tasks to display results.
ID: 35464 · Report as offensive     Reply Quote
djoser
Avatar

Send message
Joined: 30 Aug 14
Posts: 145
Credit: 10,847,070
RAC: 0
Message 36461 - Posted: 16 Aug 2018, 14:15:20 UTC - in response to Message 35464.  

Hi!

I have two (low power) machines running LHCb tasks.
One of these hosts (ID: 10552164) shows a very low cpu usage compared to the runtime of a task.
It only shows between 25 - 40 % cpu time.

The other host has a "normal" cpu/runtime proportion.

What could be the reason for that?

Greetings, djoser.
Why mine when you can research? - GRIDCOIN - Real cryptocurrency without wasting hashes! https://gridcoin.us
ID: 36461 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,491,159
RAC: 104,616
Message 36493 - Posted: 18 Aug 2018, 18:40:51 UTC - in response to Message 36461.  

It only shows between 25 - 40 % cpu time.
The other host has a "normal" cpu/runtime proportion.
What could be the reason for that?
the experience I've made with LHCb is that these tasks are extremely volatile what concerns CPU usage.
ID: 36493 · Report as offensive     Reply Quote

Message boards : LHCb Application : Zero CPU usage


©2024 CERN