Message boards :
LHCb Application :
Low CPU usage
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
Well , i have taken a look at your last log , but i don't see the autentication failure.It's curious. If one day , you would want to do another try, there is maybe another way to try to understand what is happening during the running of your work units. I didn't see anyone advising you in the forum to look inside the VM with the monitoring console. ------------------------------------------------------------------------------- When the work units starts in Boinc manager, you click on the left panel "show VM console".It opens the remote desktop.A large windows appears. Then you can see the initialisation of the Wu. At the beginning , look with Alt+F1 ,the message which appears ,if there are warning displayed. After a while , nothing move so you press Alt+F3 , and you can access to the top command. There, you can see all the process executed inside the VM. When the job will really start , a process ,managed by user "nobody" is present. Quickly (15min to 25 min) its cpu activity grows up to 97% and memory consumption increase to nearly 50%. If the higher process stays between 1 and 3 % it's abnormal.You may press Alt+F5 to see eventual error messages... Then the best is to do a reboot of the computer to begin another time to force the server to communicate with your VM.And you check at once in virtualbox manager to delete faulty vms with red circle and ...you wait and see in the vm console another time. -------------------------------------------------------------------------------- This recipe is the way , i manage my host, but i 'm a beginner in LHCb , so there are perhaps other solutions, more academic. Maybe it would work for you, too. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
All LHC programs, save SixTrack, use at most 9% of CPU, This means that they are doing nothing. Some,like CMS, fail as BOINC projects after a short time. Others go on for hours doing nothing. I look at the VM console both on the Windows PC and the Linux boxes, and it does not signal anything. Then I go see the Machine logs, and see many error messages which I am unable to understand. The two native Atlas tasks on my E-450 Linux box work perfectly. Atlas tasks started in LHC do not report a HITS file, but they give me credits. But why native Atlas tasks work perfectly and LHC Atlas tasks do not? Tullio |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
In fact , your last Atlas tasks were 2 cores work unit with only 3600 MBytes allocated : 2017-03-16 17:58:22 (6468): Setting Memory Size for VM. (3600MB) The crunchers found that the 2 core wu needs at least 4400 MBytes, so David Cameron change the settings to allocate more ram memory for the vm. Now it may work. --------------------------------------------------------------------------------- Only few people understand the error messages in the log. It's not easy. Therefore there is this forum. But sometimes no one knows the reasons of the failure... This is science with experimentation. Always go far and increase our knowledge. |
Send message Joined: 27 Sep 08 Posts: 850 Credit: 692,713,859 RAC: 95,524 |
My LHCb was back to normal, good CPU load. I got credit even though my PC was idle. I assume that it's idle as the project has no work on the CERN side or bug? I have no issues with other scientific applications I did have same problem for ATLAS so I went back to single core, you can as discussed in other threads use app_config to force 4400MB if you would like to run |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
That was a LHC@home Atlas task. I am currently running a native Atlas task on a Linux box with 4100 MB RAM and am at 86% after 23:45 hours. It runs on the E-450 AMD CPU. Tullio All LHCb tasks continue failing on the Windows 10 PC with its 24 GB RAM and its A10-6700 AMD CPU, 4 cores, at 3.7 GHz and up to 4.14 GHz. |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
I 'm looking for your LHCb tasks in your windows. You have effectively opened the vm console 2017-03-24 21:13:15 (800): Detected: Web Application Enabled (http://localhost:64018)but you didn't do a reboot in order to force communication with the server. Normaly , when you do it , the number identifying your session changes 2017-03-24 21:12:40 (800): vboxwrapper (7.7.26196): starting 2017-03-25 09:28:28 (800): VM Completion Message: Condor exited after 44040s without running a job. In my logs , you can see it : 2017-03-24 13:14:29 (5964): Guest Log: [DEBUG] 0 2017-03-24 13:25:09 (3748): vboxwrapper (7.7.26196): starting the aim is to start the cpu activity inside the vm,when it stays too low. I noticed if i begin boinc with windows ,the activity of telemetry (service backward) seems to interfere with the launch of Vboxheadless (too much ressource employed at the same time).So waiting 10 min after the start and doing a reboot when this service of telemetry is finished allows the vm to start correctly.But i'm not yet sure if this is the real cause. I have to take a deeper inspection of this. Try to do a reboot 10 min after the start of the wu and report if the change is successful or not. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I have LHCb tasks failing on all PCs, two Linux boxen and a Windows 10 PC. Only SixTrack works and native Atlas, not LHC Atlas. I am using a Windows 10 PC to crunch Einstein@home GPU tasks on its GTX 1050 GPU board and a Linux box to crunch SETI@home beta GPU tasks on its GTX 750 GPU board, soon to be changed to GTX 750 Ti since it needs more RAM to crunch heavy Einstein@home GPU tasks. Only a HP laptop with no GPU board is used to crunch native Atlas@home tasks, climateprediction.net, Einstein@home and Seti@home CPU tasks.Sorry for LHC, but it does not work. I was an Alpha tester in Test4Theory@home, on invitation by dr. Ben Segal. Tullio |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
I understand , you can't do a reboot because your windows 10 pc is used for other projects. I don't have this disadvantage , even if i crunch WCG tasks too, because they don't need neither virtualbox , nor gpu. Resulting in this case , i don't think , i can help you so. When a project doesn't work , the best is to test it alone ,with the less possible of parameters and only after, look if their simultaneous use is possible. I think you would rather run easiest projects , with less constraints. Crunching is not obvious,2 levels of difficulty, like heating water and milk: with water , it boils without problem (sixtrack) ;with milk you have to always take a look on it before it speads over the pot(LHCb).It's physical ,we can't change it... |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I have been running two LHCb tasks alone on the Windows 10 PC with no other task running. They all failed. I don't have the habit of rebooting PCs to run programs. If they are good ones they must run. This is what I learned running UNIX since 1981 on all kind of systems, including RISC minicomputers at Trieste Area Science Park . Tullio |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
so you know , all is not as easy as we can expect. I agree this is a difficult project with sometimes failure caused by the server side too.It 's not your fault.There are temporaily issues. Science advance with tests , sometimes good results ,sometimes bad. Sorry , if you are disappointed. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Well, my native Atlas task is at 90% on the HP laptop. So at least something works. Cheers. Tullio |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 677 |
Tullio, you have successful CMS-Tasks in your Windows 10 PC. Is it possible for you to change your LHC-Preferences from LHCb to CMS and make a test with CMS? After change your preferences, make a reset of LHCatHome project in Boinc and load the masterfile new. Hope you have luck. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I see only a SixTrack validated task. Tullio |
Send message Joined: 2 May 07 Posts: 2244 Credit: 173,902,375 RAC: 677 |
This finished CMS-Tasks are shown under your Computer-list in lhcatHome. When you click in your Username this is also shown. https://lhcathome.cern.ch/lhcathome/host_app_versions.php?hostid=10407309 |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Yes, but they were completed before the consolidation of LHC. Now I have completed a native Atlas@home task on the Linux box with E-450 and it shows the HITS file. A second multicore task has just started running. Tullio I shall go on crunching Atlas@home tasks as long as they are available, then I shall leave the project |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
All LHCb tasks fail on my 3 PCs, but Linux tasks fail much sooner than Windows 10 tasks, thus leaving space to other projects. Tullo |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I have abandoned all LHC tasks after being an Alpha tester in Test4Theory@home because they all fail on my 3 PCs, two Linux 64-bit and one Windows 10. I am running SETI@Home both CPU and GPU tasks, also Einstein@home. My Einstein RAC has reached 73k since one Linux box with a GTX 750 Ti board is running only it. The Windows PC has a GTX 1050 Ti board. I am running also climateprediction.net on a CPU. Tullio |
Send message Joined: 23 Apr 10 Posts: 5 Credit: 1,349,240 RAC: 0 |
Hello, Can someone explain me why the CPU time of most of the LHCb WU's is about 10-15% of the running time ? Except for "long tasks", I see a lot of "waste of time" : 172067635 83393496 3 Jan 2018, 7:37:53 UTC 3 Jan 2018, 10:47:31 UTC Terminé et validé 1,472.49 231.11 56.06 LHCb Simulation v1.03 (vbox64) Merci Phil1966 |
Send message Joined: 23 Apr 10 Posts: 5 Credit: 1,349,240 RAC: 0 |
Est=ce qu'il y a quelqu'un, au CERN, qui lit parfois les messages postés ici ? |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
Trying to understand what is happening , i found the dirac site which enables users to see LHCb statistics. Here is the cpu usage during a week following job processing types : Here is the cpu efficiency during the same period for the same jobs processing types : It seems that the behavior of the types of job executed is different. Maybe a cern member could speak about it to give us some explanations ? (short summary of these types of job and if these plots traduces with more or less accuracy the issues encountered by crunchers.) (the name of the wus don't enable to identify the type of jobs executed). |
©2024 CERN