Message boards :
Theory Application :
New Version 263.70
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 3 Nov 12 Posts: 75 Credit: 171,116,904 RAC: 82,720 ![]() ![]() ![]() |
The new app is a multicore app that uses a 2-core setup on your hosts. This is not recommended for the hosts you mentioned as they have only 2 cores. You may navigate to the project's preferences page and set "max # of CPUs" to 1. First I will give this a try. About memory Two of the failing computers are equipped with 6GB of ram. I think this should be enough for a 2core Theory task. One of them has 4GB only. But the result is the same. Is it possible, that some other tasks of the OS (Manjaro Linux here) blocks one of the CPUs and impedes VB to work correctly? |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1236 Credit: 79,814,314 RAC: 74,738 ![]() ![]() |
... and the LHCb's are working there right now too. (also multi-core)LHCb can be run multicore? Not here......over at LHC-dev test site. |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 ![]() ![]() |
Is it possible, that some other tasks of the OS (Manjaro Linux here) Possibly your desktop environment. Full featured desktops like Gnome or KDE use considerable RAM and CPU time. Lightweight desktops like LXDE and XFCE use considerably less RAM and CPU. Which desktop do you use? |
Send message Joined: 6 Sep 08 Posts: 119 Credit: 14,257,272 RAC: 5,923 ![]() ![]() ![]() |
VMs are using the local squid again (3909): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-07-11 02:53:10 (3909): Guest Log: 2.4.4.0 3540 1 25728 6631 3 1 183741 10240000 2 65024 0 15 100 0 0 http://s1cern-cvmfs.openhtc.io/cvmfs/grid.cern.ch http://192.168.100.137:3128 1 2018-07-11 02:53:11 (3909): Guest Log: [INFO] Reading volunteer information Working OK so far. Thanks Laurence. |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,539,793 RAC: 175 ![]() ![]() |
Same here, openhtc.io and local squid are used now. Thanks Laurence! Some questions and a suggestion: - Is it correct that if the VM is configured to use 4 cores, it will run concurrently 4 separate Condor Jobs? - What is displayed on the ATL+F2 screen? Is it a randomly chosen output from one of the currently running Condor Jobs? - Would it be possible to associate one ATL+Fx screen to one particular core (or slot directory) to display the output of the Condor Job that is currently running on that core (i.e. e.g. screen ALT+F2 displays the output of the Condor Job running on Core 1, ALT+F3 displays the output of the Condor Job running on Core 2, and so on)? |
![]() Send message Joined: 15 Jun 08 Posts: 2685 Credit: 286,933,615 RAC: 56,694 ![]() ![]() |
... Is it correct that if the VM is configured to use 4 cores, it will run concurrently 4 separate Condor Jobs? Yes. As many concurrently running subtasks (or Condor Jobs) as cores are configured. |
![]() ![]() Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,888,115 RAC: 831 ![]() ![]() |
- What is displayed on the ATL+F2 screen? Is it a randomly chosen output from one of the currently running Condor Jobs?It will show the job running on the first of the cores. You can see the output from all the allocated cores under Show Graphics then Logs then down to running-slotX.log, although that will only update if you refresh. - Would it be possible to associate one ATL+Fx screen to one particular core (or slot directory) to display the output of the Condor Job that is currently running on that core (i.e. e.g. screen ALT+F2 displays the output of the Condor Job running on Core 1, ALT+F3 displays the output of the Condor Job running on Core 2, and so on)?Would be easier for us users than using the method above but I don't know how easy it is to impliment or even if it is possible as ALT+F(1 - 6) are already in use for various outputs. |
Send message Joined: 3 Nov 12 Posts: 75 Credit: 171,116,904 RAC: 82,720 ![]() ![]() ![]() |
Possibly your desktop environment. Full featured desktops like Gnome or KDE use considerable RAM and CPU time. Lightweight desktops like LXDE and XFCE use considerably less RAM and CPU. Which desktop do you use? Boinc is installed as a system service here. So I don't need to log in to desktop for running LHC@home. Btw, it's XFCE on the 6G machines while the 4G host runs without desktop environment. You may navigate to the project's preferences page and set "max # of CPUs" to 1. Since I did this I had no more "VM Heartbeat file specified, but missing heartbeat." errors. For me it looks like we need at least a processor with more than two treads to run the 2-processor-mt-tasks. Now I run two single core mt-task at once and its o.k. Thank you a lot for your help! |
Send message Joined: 6 Sep 08 Posts: 119 Credit: 14,257,272 RAC: 5,923 ![]() ![]() ![]() |
VMs are using the local squid again Maybe I wrote too soon, the VM still sometimes fails to use the local squid. 2018-07-14 03:46:43 (2620): Guest Log: [DEBUG] Detected squid proxy http://192.168.100.137:3128 2018-07-14 03:47:59 (2620): Guest Log: VERSION PID UPTIME(M) MEM(K) REVISION EXPIRES(M) NOCATALOGS CACHEUSE(K) CACHEMAX(K) NOFDUSE NOFDMAX NOIOERR NOOPEN HITRATE(%) RX(K) SPEED(K/S) HOST PROXY ONLINE 2018-07-14 03:47:59 (2620): Guest Log: 2.4.4.0 3533 1 25768 6661 3 1 183731 10240000 2 65024 0 15 93.3333 13 21 http://s1cern-cvmfs.openhtc.io/cvmfs/grid.cern.ch DIRECT 0 |
Send message Joined: 9 Dec 14 Posts: 202 Credit: 2,539,793 RAC: 175 ![]() ![]() |
I had the same problem with the previous version (before mulit core). I dont know exact numbers, but maybe 1 out of 6 or so did not correctly setup the proxy. Do you guys know why my Theroy VMs do NOT shut down correctly? This happens to every single task, e.g. this one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=200422861 The log says: 2018-07-20 12:51:44 (4364): Guest Log: [INFO] Job finished in slot1 with 0. 2018-07-20 13:02:07 (4364): Guest Log: [INFO] Condor exited with return value N/A. 2018-07-20 13:02:07 (4364): Guest Log: [INFO] Shutting Down. 2018-07-20 13:02:07 (4364): VM Completion File Detected. 2018-07-20 13:02:07 (4364): VM Completion Message: Condor exited with return value N/A.. 2018-07-20 13:02:07 (4364): Powering off VM. 2018-07-20 13:07:11 (4364): VM did not power off when requested. 2018-07-20 13:07:11 (4364): VM was successfully terminated. 2018-07-20 13:07:11 (4364): Deregistering VM. (boinc_95d70c61e78dce9e, slot#0) 2018-07-20 13:07:11 (4364): Removing network bandwidth throttle group from VM. 2018-07-20 13:07:11 (4364): Removing VM from VirtualBox. 13:07:17 (4364): called boinc_finish(0) |
![]() Send message Joined: 1 Sep 04 Posts: 141 Credit: 2,579 RAC: 0 |
I had the same problem with the previous version (before mulit core). I dont know exact numbers, but maybe 1 out of 6 or so did not correctly setup the proxy. It doesn't seem very serious as the VM was terminated even though the power-off failed. Perhaps Laurence can comment but for now don't worry... Ben |
![]() Send message Joined: 15 Jun 08 Posts: 2685 Credit: 286,933,615 RAC: 56,694 ![]() ![]() |
"VM did not power off when requested." indicates that the shutdown needs more time than expected and hits a watchdog timeout. Don't worry, if it happens only occasionally. If it occurs very often and together with other indicators like - lots of blank lines in the logs - lines with lots of garbage - the "postponed..." error mentioned in other threads the computer or one of it's resources is most likely permanently too busy. That situation should be investigated. |
![]() Send message Joined: 15 Jun 08 Posts: 2685 Credit: 286,933,615 RAC: 56,694 ![]() ![]() |
Maybe I wrote too soon, the VM still sometimes fails to use the local squid. Sorry for the late response. I checked a couple of your logs and all of them had a correct proxy configuration. Do you still have some VMs with errors? Do you notice a relevant number of requests in your proxy log? |
Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,962,497 RAC: 80,319 ![]() ![]() ![]() |
"VM did not power off when requested."I see this with ALL tasks on the machine which uses Theory Simulation v263.70 (vbox64_mt_mcore) windows_x86_64 but NOT on the two other PCs which use application Theory Simulation v263.50 (vbox32) windows_intelx86 whatever this observation is worth ... |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 ![]() ![]() |
Checked the last 19 Theory tasks I returned and none show "Failed to shutdown the VM". Currently running Theory on only 1 host and it's using Theory Simulation v263.70 (vbox64_mt_mcore)x86_64-pc-linux-gnu. |
Send message Joined: 6 Sep 08 Posts: 119 Credit: 14,257,272 RAC: 5,923 ![]() ![]() ![]() |
the VM still sometimes fails to use the local squid. Yes, in total I have details of three tasks:- In all cases the proxy was reported as detected, but the VM was not reported as set up to use it. Entries in the access log are taken to show that the proxy was (or wasn't) actually used. https://lhcathome.cern.ch/lhcathome/result.php?resultid=200124140 ( Proxy used). https://lhcathome.cern.ch/lhcathome/result.php?resultid=200145344 ( Proxy not used) https://lhcathome.cern.ch/lhcathome/result.php?resultid=200221912 ( Proxy used) The last two are the same host. The inconsistency is puzzling. I'll have to wait for some more. If anyone else sees these failures it would be interesting to see their results. Hopefully I haven't misread something somewhere... |
![]() Send message Joined: 15 Jun 08 Posts: 2685 Credit: 286,933,615 RAC: 56,694 ![]() ![]() |
The CVMFS inside the VM configures a proxy via this directive: CVMFS_HTTP_PROXY='http://<proxy_name_or IP>:3128;DIRECT' If the proxy does not respond, DIRECT is used as fallback. Beside that the configuration directive CVMFS_PROXY_RESET_AFTER=300 tries to switch back to the first proxy after 300 s. Thus the LHC VMs behave as expected and it should be investigated why your local proxy doesn't respond occasionally. |
Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,962,497 RAC: 80,319 ![]() ![]() ![]() |
I now looked up the application details for one of my PCs on which, among others, I have been crunching Theory tasks for long time. Interesting results: Theory Simulation 263.20 windows_x86_64 (vbox64): Average processing rate 24.38 GFLOPS Theory Simulation 263.60 windows_x86_64 (vbox64_mt_mcore): Average processing rate 30.63 GFLOPS Theory Simulation 263.70 windows_x86_64 (vbox64_mt_mcore): Average processing rate 21.53 GFLOPS So, version 263.70 is even slower, at least on my PC, than 263.20 was. And also slower than 263.60 - how come? |
![]() Send message Joined: 15 Jun 08 Posts: 2685 Credit: 286,933,615 RAC: 56,694 ![]() ![]() |
Your VMs run a mix of different scientific apps like pythia, sherpa, agile-runmc ... Some of them, e.g. sherpa, occasionally cause longer idle periods that influence the average efficiency index. |
Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,962,497 RAC: 80,319 ![]() ![]() ![]() |
like some time ago when v263.70 came up, I now made another comparison between 3 of my PCs. And the result was, that with 2 old machines with 32-bit Windows and about 10 years old processors (AMD Turion(tm) Neo X2 Dual Core Processor L625, and AMD Turion Dual-Core ZM-80), using Theory Simulation v263.50 (vbox32) windows_intelx86, I get about 3 times as many credit points than with a newer PC, 64-bit Windows, Intel(R) Core(TM) i5 CPU M 480 @ 2.67GHz, using Theory Simulation v263.70 (vbox64_mt_mcore) windows_x86_64. Can anyone explain this? |
©2025 CERN