Message boards : LHCb Application : Low CPU usage
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 29595 - Posted: 24 Mar 2017, 17:14:59 UTC - in response to Message 29593.  
Last modified: 24 Mar 2017, 17:57:05 UTC

Well , i have taken a look at your last log , but i don't see the autentication failure.It's curious.
If one day , you would want to do another try,
there is maybe another way to try to understand what is happening during the running of your work units.
I didn't see anyone advising you in the forum to look inside the VM with the monitoring console.
-------------------------------------------------------------------------------
When the work units starts in Boinc manager, you click on the left panel "show VM console".It opens the remote desktop.A large windows appears.
Then you can see the initialisation of the Wu.
At the beginning , look with Alt+F1 ,the message which appears ,if there are warning displayed.
After a while , nothing move so you press Alt+F3 , and you can access to the top command.
There, you can see all the process executed inside the VM.
When the job will really start , a process ,managed by user "nobody" is present.
Quickly (15min to 25 min) its cpu activity grows up to 97% and memory consumption increase to nearly 50%.
If the higher process stays between 1 and 3 % it's abnormal.You may press Alt+F5 to see eventual error messages...
Then the best is to do a reboot of the computer to begin another time to force the server to communicate with your VM.And you check at once in virtualbox manager to delete faulty vms with red circle and ...you wait and see in the vm console another time.
--------------------------------------------------------------------------------
This recipe is the way , i manage my host, but i 'm a beginner in LHCb , so there are perhaps other solutions, more academic.
Maybe it would work for you, too.
ID: 29595 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29598 - Posted: 24 Mar 2017, 18:04:34 UTC - in response to Message 29595.  

All LHC programs, save SixTrack, use at most 9% of CPU, This means that they are doing nothing. Some,like CMS, fail as BOINC projects after a short time. Others go on for hours doing nothing. I look at the VM console both on the Windows PC and the Linux boxes, and it does not signal anything. Then I go see the Machine logs, and see many error messages which I am unable to understand. The two native Atlas tasks on my E-450 Linux box work perfectly. Atlas tasks started in LHC do not report a HITS file, but they give me credits. But why native Atlas tasks work perfectly and LHC Atlas tasks do not?
Tullio
ID: 29598 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 29600 - Posted: 24 Mar 2017, 18:29:01 UTC - in response to Message 29598.  
Last modified: 24 Mar 2017, 18:29:25 UTC

In fact , your last Atlas tasks were 2 cores work unit with only 3600 MBytes allocated :
2017-03-16 17:58:22 (6468): Setting Memory Size for VM. (3600MB)
2017-03-16 17:58:22 (6468): Setting CPU Count for VM. (2)

The crunchers found that the 2 core wu needs at least 4400 MBytes, so David Cameron change the settings to allocate more ram memory for the vm.
Now it may work.
---------------------------------------------------------------------------------
Only few people understand the error messages in the log.
It's not easy.
Therefore there is this forum.
But sometimes no one knows the reasons of the failure...
This is science with experimentation.
Always go far and increase our knowledge.
ID: 29600 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 846
Credit: 691,113,241
RAC: 111,679
Message 29602 - Posted: 24 Mar 2017, 19:04:36 UTC
Last modified: 24 Mar 2017, 19:05:14 UTC

My LHCb was back to normal, good CPU load. I got credit even though my PC was idle.

I assume that it's idle as the project has no work on the CERN side or bug?

I have no issues with other scientific applications

I did have same problem for ATLAS so I went back to single core, you can as discussed in other threads use app_config to force 4400MB if you would like to run
ID: 29602 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29611 - Posted: 25 Mar 2017, 8:39:06 UTC - in response to Message 29600.  
Last modified: 25 Mar 2017, 9:00:28 UTC

That was a LHC@home Atlas task. I am currently running a native Atlas task on a Linux box with 4100 MB RAM and am at 86% after 23:45 hours. It runs on the E-450 AMD CPU.
Tullio
All LHCb tasks continue failing on the Windows 10 PC with its 24 GB RAM and its A10-6700 AMD CPU, 4 cores, at 3.7 GHz and up to 4.14 GHz.
ID: 29611 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 29622 - Posted: 25 Mar 2017, 14:23:45 UTC - in response to Message 29611.  
Last modified: 25 Mar 2017, 14:41:52 UTC

I 'm looking for your LHCb tasks in your windows.
You have effectively opened the vm console
2017-03-24 21:13:15 (800): Detected: Web Application Enabled (http://localhost:64018)
2017-03-24 21:13:15 (800): Detected: Remote Desktop Enabled (localhost:64019)
but you didn't do a reboot in order to force communication with the server.
Normaly , when you do it , the number identifying your session changes
2017-03-24 21:12:40 (800): vboxwrapper (7.7.26196): starting
2017-03-25 09:28:28 (800): VM Completion Message: Condor exited after 44040s without running a job.
In my logs , you can see it :
2017-03-24 13:14:29 (5964): Guest Log: [DEBUG] 0
2017-03-24 13:25:09 (3748): vboxwrapper (7.7.26196): starting

the aim is to start the cpu activity inside the vm,when it stays too low.
I noticed if i begin boinc with windows ,the activity of telemetry (service backward) seems to interfere with the launch of Vboxheadless (too much ressource employed at the same time).So waiting 10 min after the start and doing a reboot when this service of telemetry is finished allows the vm to start correctly.But i'm not yet sure if this is the real cause.
I have to take a deeper inspection of this.
Try to do a reboot 10 min after the start of the wu and report if the change is successful or not.
ID: 29622 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29624 - Posted: 25 Mar 2017, 14:43:18 UTC - in response to Message 29622.  

I have LHCb tasks failing on all PCs, two Linux boxen and a Windows 10 PC. Only SixTrack works and native Atlas, not LHC Atlas. I am using a Windows 10 PC to crunch Einstein@home GPU tasks on its GTX 1050 GPU board and a Linux box to crunch SETI@home beta GPU tasks on its GTX 750 GPU board, soon to be changed to GTX 750 Ti since it needs more RAM to crunch heavy Einstein@home GPU tasks. Only a HP laptop with no GPU board is used to crunch native Atlas@home tasks, climateprediction.net, Einstein@home and Seti@home CPU tasks.Sorry for LHC, but it does not work. I was an Alpha tester in Test4Theory@home, on invitation by dr. Ben Segal.
Tullio
ID: 29624 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 29627 - Posted: 25 Mar 2017, 15:13:24 UTC - in response to Message 29624.  
Last modified: 25 Mar 2017, 15:14:03 UTC

I understand , you can't do a reboot because your windows 10 pc is used for other projects.
I don't have this disadvantage , even if i crunch WCG tasks too, because they don't need neither virtualbox , nor gpu.
Resulting in this case , i don't think , i can help you so.
When a project doesn't work , the best is to test it alone ,with the less possible of parameters and only after, look if their simultaneous use is possible.
I think you would rather run easiest projects , with less constraints.
Crunching is not obvious,2 levels of difficulty, like heating water and milk:
with water , it boils without problem (sixtrack) ;with milk you have to always take a look on it before it speads over the pot(LHCb).It's physical ,we can't change it...
ID: 29627 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29629 - Posted: 25 Mar 2017, 15:25:31 UTC - in response to Message 29627.  
Last modified: 25 Mar 2017, 15:26:32 UTC

I have been running two LHCb tasks alone on the Windows 10 PC with no other task running. They all failed. I don't have the habit of rebooting PCs to run programs. If they are good ones they must run. This is what I learned running UNIX since 1981 on all kind of systems, including RISC minicomputers at Trieste Area Science Park .
Tullio
ID: 29629 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 29630 - Posted: 25 Mar 2017, 15:40:33 UTC - in response to Message 29629.  

so you know , all is not as easy as we can expect.
I agree this is a difficult project with sometimes failure caused by the server side too.It 's not your fault.There are temporaily issues.
Science advance with tests , sometimes good results ,sometimes bad.
Sorry , if you are disappointed.
ID: 29630 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29631 - Posted: 25 Mar 2017, 15:57:53 UTC - in response to Message 29630.  

Well, my native Atlas task is at 90% on the HP laptop. So at least something works. Cheers.
Tullio
ID: 29631 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2242
Credit: 173,899,385
RAC: 2,828
Message 29632 - Posted: 25 Mar 2017, 16:02:20 UTC

Tullio,

you have successful CMS-Tasks in your Windows 10 PC.

Is it possible for you to change your LHC-Preferences from LHCb to CMS and make a test with CMS?

After change your preferences, make a reset of LHCatHome project in Boinc and load the masterfile new.

Hope you have luck.
ID: 29632 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29634 - Posted: 25 Mar 2017, 17:22:42 UTC

I see only a SixTrack validated task.
Tullio
ID: 29634 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2242
Credit: 173,899,385
RAC: 2,828
Message 29637 - Posted: 25 Mar 2017, 18:01:42 UTC
Last modified: 25 Mar 2017, 18:06:43 UTC

This finished CMS-Tasks are shown under your Computer-list in lhcatHome.
When you click in your Username this is also shown.

https://lhcathome.cern.ch/lhcathome/host_app_versions.php?hostid=10407309
ID: 29637 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29641 - Posted: 26 Mar 2017, 0:23:47 UTC - in response to Message 29637.  
Last modified: 26 Mar 2017, 0:47:27 UTC

Yes, but they were completed before the consolidation of LHC. Now I have completed a native Atlas@home task on the Linux box with E-450 and it shows the HITS file. A second multicore task has just started running.
Tullio
I shall go on crunching Atlas@home tasks as long as they are available, then I shall leave the project
ID: 29641 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29693 - Posted: 28 Mar 2017, 9:10:00 UTC

All LHCb tasks fail on my 3 PCs, but Linux tasks fail much sooner than Windows 10 tasks, thus leaving space to other projects.
Tullo
ID: 29693 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 29829 - Posted: 5 Apr 2017, 13:55:32 UTC
Last modified: 5 Apr 2017, 13:57:22 UTC

I have abandoned all LHC tasks after being an Alpha tester in Test4Theory@home because they all fail on my 3 PCs, two Linux 64-bit and one Windows 10. I am running SETI@Home both CPU and GPU tasks, also Einstein@home. My Einstein RAC has reached 73k since one Linux box with a GTX 750 Ti board is running only it. The Windows PC has a GTX 1050 Ti board. I am running also climateprediction.net on a CPU.
Tullio
ID: 29829 · Report as offensive     Reply Quote
Profile [AF>Amis des Lapins] Phil1966

Send message
Joined: 23 Apr 10
Posts: 5
Credit: 1,349,240
RAC: 0
Message 33641 - Posted: 3 Jan 2018, 12:39:12 UTC

Hello,
Can someone explain me why the CPU time of most of the LHCb WU's is about 10-15% of the running time ?
Except for "long tasks", I see a lot of "waste of time" :

172067635 83393496 3 Jan 2018, 7:37:53 UTC 3 Jan 2018, 10:47:31 UTC Terminé et validé 1,472.49 231.11 56.06 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172067636 83393497 3 Jan 2018, 7:37:53 UTC 3 Jan 2018, 10:22:23 UTC Terminé et validé 1,434.63 231.69 11.52 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172068470 83394180 3 Jan 2018, 7:37:53 UTC 3 Jan 2018, 12:37:11 UTC Terminé et validé 11,385.51 8,222.75 247.15 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172068471 83394181 3 Jan 2018, 7:37:53 UTC 3 Jan 2018, 9:34:20 UTC Terminé et validé 1,292.77 199.75 10.34 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172068249 83394070 3 Jan 2018, 7:37:53 UTC 3 Jan 2018, 12:28:17 UTC Terminé et validé 1,468.25 179.03 30.43 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172065958 83392240 3 Jan 2018, 7:37:53 UTC 3 Jan 2018, 9:57:43 UTC Terminé et validé 1,358.97 232.19 10.91 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172067703 83393558 3 Jan 2018, 7:37:40 UTC 3 Jan 2018, 8:45:12 UTC Terminé et validé 2,647.19 235.30 21.02 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172068259 83394080 3 Jan 2018, 7:37:40 UTC 3 Jan 2018, 11:43:16 UTC Terminé et validé 10,663.11 8,825.06 412.13 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172068524 83394233 3 Jan 2018, 7:37:40 UTC 3 Jan 2018, 9:09:18 UTC Terminé et validé 1,372.11 209.70 10.93 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172066241 83392444 3 Jan 2018, 7:37:40 UTC 3 Jan 2018, 9:12:10 UTC Terminé et validé 1,438.58 203.09 11.46 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172068586 83394294 3 Jan 2018, 7:37:40 UTC 3 Jan 2018, 9:26:50 UTC Terminé et validé 2,372.66 273.09 18.97 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172068587 83394295 3 Jan 2018, 7:37:40 UTC 3 Jan 2018, 8:46:50 UTC Terminé et validé 2,678.85 239.17 21.30 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172068588 83394296 3 Jan 2018, 7:37:40 UTC 3 Jan 2018, 8:45:22 UTC Terminé et validé 2,728.56 241.19 21.69 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172068350 83394134 3 Jan 2018, 7:37:40 UTC 3 Jan 2018, 12:03:08 UTC Terminé et validé 14,657.90 11,284.63 412.88 LHCb Simulation v1.03 (vbox64)
windows_x86_64
172068351 83394135 3 Jan 2018, 7:37:40 UTC 3 Jan 2018, 8:46:09 UTC Terminé et validé 2,810.61 225.19 22.35 LHCb Simulation v1.03 (vbox64)
windows_x86_64


Merci

Phil1966
ID: 33641 · Report as offensive     Reply Quote
Profile [AF>Amis des Lapins] Phil1966

Send message
Joined: 23 Apr 10
Posts: 5
Credit: 1,349,240
RAC: 0
Message 33655 - Posted: 4 Jan 2018, 18:47:04 UTC

Est=ce qu'il y a quelqu'un, au CERN, qui lit parfois les messages postés ici ?
ID: 33655 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 33662 - Posted: 4 Jan 2018, 21:44:37 UTC - in response to Message 33655.  

Trying to understand what is happening , i found the dirac site which enables users to see LHCb statistics.

Here is the cpu usage during a week following job processing types :


Here is the cpu efficiency during the same period for the same jobs processing types :


It seems that the behavior of the types of job executed is different.

Maybe a cern member could speak about it to give us some explanations ? (short summary of these types of job and if these plots traduces with more or less accuracy the issues encountered by crunchers.) (the name of the wus don't enable to identify the type of jobs executed).
ID: 33662 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : LHCb Application : Low CPU usage


©2024 CERN