Message boards :
CMS Application :
Had ~100 failures on CMS 50
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 May 11 Posts: 5 Credit: 9,747,819 RAC: 0 |
Hi Team, Noticed that a machine wasn't using all of it's CPU power and tracked back to something with the CMS tasks. They have been failing for a while but also preventing other tasks from utilising the PC's resources properly: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10516481 I've disabled CMS so other projects can ramp the PC up to 100% CPU again, but would be great if you can spot anything up with it so I can re-enable it. The machine had lots or resources free but for some reason this project was preventing them being used. Eg 32 threads but BOINC put other projects in a "Waiting for memory" state when there was heaps free, plus was only seeing ~32% of CPU being used. If there are logs or any assistance I can provide please let me know, Thanks, Far |
Send message Joined: 15 Jun 08 Posts: 2488 Credit: 247,481,369 RAC: 120,966 |
Each CMS VM allocates 1 CPU core and 2 GB RAM. Your computer has 32 cores and 32 GB RAM. This would allow you to run up to 16*) CMS VMs concurrently and would leave 16 cores idle. In addition each CMS task makes heavy use of disk I/O and network, both don't need much CPU. *) Less in reality - even if the BOINC client is configured to use 100% RAM - since the OS and other processes also require RAM. |
Send message Joined: 27 May 11 Posts: 5 Credit: 9,747,819 RAC: 0 |
Thanks, that helps understand the restricted usage of available resources. Wish I could have afforded 64GB of RAM. However all the CMS50 tasks were failing anyway :-( If there are logs or anything that is needed to check why please let me know. In case it's a factor, the version of VirtualBox is more recent than the one distributed with Boinc being 6.1.12r139181 (Qt5.6.2) |
Send message Joined: 23 Nov 15 Posts: 4 Credit: 1,391,488 RAC: 0 |
Hello, same problem. Theoretically the computer has resources, but it waits for memory with CMS and blocks the loading of other projects. In the project properties comes out this: Application CMS Simulation 50.00 (vbox64) First name CMS_1538738_1607610862.401591 State Waiting for memory Received Thursday, December 10, 2020, 5:31:55 PM Deadline for reporting Saturday, January 9, 2021, 5:31:54 PM Estimated computation 1,000,000 GFLOPs CPU time 10:13:26 CPU time since last control --- Time elapsed 10:24:46 Estimated time remaining 1d 04:47:11 Fraction performed 58.399% Virtual memory size 281.63 MB Work block size 2.79 GB Directors slots / 1 Progress rate 5.760% per hour Executable vboxwrapper_26196_x86_64-pc-linux-gnu. I don't know anymore. It seems to me that I will try to disable CMS, to see what happens. I'll tell you |
Send message Joined: 23 Nov 15 Posts: 4 Credit: 1,391,488 RAC: 0 |
Hola de nou, doncs en avortar la tasca del CMS, el BOINC a començat a acceptar i executar nous treballs. |
Send message Joined: 23 Nov 15 Posts: 4 Credit: 1,391,488 RAC: 0 |
Hi again, for by aborting the work of the CMS, the BOINC has begun to accept and execute new work. |
Send message Joined: 9 Jan 15 Posts: 151 Credit: 431,596,822 RAC: 0 |
Memory 5.8 GB on both computers with 8 core system is bare minimum to handle is and few sixtrack task. You have CMS task running and got valid but aborted last one. Probably it wet other task on waiting for ram. Please uncheck box for native task and test application. You client got many task failed because CVMFS is not installed. If you want to run virtualbox you only get these task by uncheck native but would suggest to run sixtrack and maybe theory until you added more memory. |
Send message Joined: 23 Nov 15 Posts: 4 Credit: 1,391,488 RAC: 0 |
Hello Gunde. I have disabled CMS, ATLAS and native spots; as you told me, and BOINC has started running Theory. I'm also running the GPUGRID project in BOINC, maybe it's too much ?. With Kubuntu 18.04, I had no such issues. They came to me as a result of switching to Kubuntu 20.04, although the change is well worth it. I plan to upgrade the RAM to 24GB, but it won’t be this year. I go to the Linux section, to ask about CVMFS installation issues, thanks. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
All CMS tasks fail on my Windows 10 PC, while Atlas tasks on 2 cores and Theory tasks work perfectly, not to speak of SixTrack tasks. Condor fails after about 10000 s. Tullio |
Send message Joined: 2 May 07 Posts: 2184 Credit: 172,751,088 RAC: 42,089 |
Tullio, you have 12 GByte RAM for your Windows Core. This can be to small to run Atlas or Theory AND CMS. What is, when you start only one CMS Task, to see what happens. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Tullio, Condor reaches 15268 s, it was my last running task since QuChemPedIA@home is apparently dead. It also uses VirtualBox, but most of the time it completes a task before its Linux wingman, even if Linux runs on a much more powerful CPU. But the CMS was a problem also on my other Windows 10 PC with 24 GB RAM. Tullio |
Send message Joined: 15 Jun 08 Posts: 2488 Credit: 247,481,369 RAC: 120,966 |
What you describe looks like an individual problem of your computer. The time range points out it happens at the end of a subtask calculation. Hence, to get out what happens it would be necessary that you watch the console output around that phase. Console at ALT-F2: shows how many records are processed. You will see something like "Begin processing the 6144th record ..." A subtask is complete after 10000 records. Console at ALT-F3: Shows the output of the top utility In the list look at the runtime of the command cmsRun. It's in minutes and together with the output from ALT-F2 it allows you to estimate when your subtasks are complete. ALT-F4 and ALT-F5 show diagnostic messages as well as errors. If CMS can identify a problem it prints corresponding messages on that consoles. Unfortunatly they disappear when other messages are printed. Hence, you would have to quickly switch between ALT-F4/5 during a subtask change. If the task is alive long enough you may have a chance to copy the logfiles from inside the VM. Mark the task in your BOINC manager and click on "show graphics". A Browser window opens where you can follow a link to the logs. This can be prepared before the calculation is done. At the critical moment just refresh the browser window. Check the last messages from the logs for errors or warnings, typically for network connection errors and/or retries. It makes no sense to compare CMS with other VBox apps, not even with ATLAS or Theory from LHC@home since all of them use different communication channels. In case of CMS it's HTCondor and WMAgent. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
My computer is as standard as it can. No other project has any problem, either on CPU or GPU. There was a time when Einstein@home Gravitational wave tasks on GPU would not run because the GTX 1060 had only 3 GB Video RAM so I installed a new board with 4 GB Video RAM. Now I am running World Community Grid and Rosetta@home tasks on another PC with 3 GB Video RAM and they all run perfectly. On QuChemPedIA@home, now unluckily stopped, I was 37th in the ranking list of RAC using VirtualBox against CPUs running Linux on 124 or more Processors such as Ryzen 9. And this is only an Intel i5 9400F with 6 processors. Tullio |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I can see the show graphics logs and they signal some connection errors. But the task goes on all the same. Tullio |
Send message Joined: 2 May 07 Posts: 2184 Credit: 172,751,088 RAC: 42,089 |
Have you a LAN-Cable or WiFi for this PC? Is the Network-Connect to the Router correct shown in Windows? Don't know why only CMS with HT-Condor have this problem and not Atlas or Theory..... |
Send message Joined: 15 Jun 08 Posts: 2488 Credit: 247,481,369 RAC: 120,966 |
I can see the show graphics logs and they signal some connection errors. Those error messages might be helpful. Can you post them here? |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
Have you a LAN-Cable or WiFi for this PC? I have WiFi connection for this Pc, another PC and a HP Printer. They all work. Tullio |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I can see the show graphics logs and they signal some connection errors. I shall do it next time I get a CMS task. Thanks. Tullio |
Send message Joined: 2 May 07 Posts: 2184 Credit: 172,751,088 RAC: 42,089 |
Have you a LAN-Cable or WiFi for this PC? When you have the possibility to test CMS with a LAN-Cable for one running CMS-Task, it can show the reason of the Error, maybe. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
I have launched LHC@home on a HP Laptop which is connected to a LAN Cable. It got two Atlas tasks, one of which completed and validated. When I get a CMS task on it, I shall look at the error messages and post them here. Thanks. Tullio |
©2024 CERN