Message boards : Theory Application : high incidence of "missing heartbeat" lately
Message board moderation

To post messages, you must log in.

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,535,684
RAC: 119,123
Message 38047 - Posted: 22 Feb 2019, 6:22:52 UTC

I am crunching Theory tasks on 5 machines.
With one of them I notice that after the recent Microsoft Updates (Windows 7 32-bit) there is an increased failure incidence, always due to "missing hearbeat". The bad thing is that this always happens after 10-13 hours, so a lot of CPU time is wasted for nothing :-(
An example can be seen here:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=217623492

I experienced this problem before, once in a while. I then somewhere read that one could try to increase the CPU priority for the VBoxWrapper (via the task manager); I did this, and it helped.
So this was the first thing which I checked now, but actually the priority is still set to "above normal", as I had set it before.
Obviously, the problem now must be caused by something else.

Does anyone have any advice what else I could try to overcome this problem?
ID: 38047 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 25 Sep 17
Posts: 99
Credit: 3,261,384
RAC: 5,902
Message 38058 - Posted: 23 Feb 2019, 10:50:40 UTC - in response to Message 38047.  

Have you checked for disk errors lately? You might want to update your Virtualbox to 5.2.x since earlier versions are no longer supported by VirtualBox.
Version 5.2 will remain supported until July 2020, according to www.virtualbox.org
Do you have anti virus and exclusions set for the directories like in Yeti's check list?
ID: 38058 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,535,684
RAC: 119,123
Message 38061 - Posted: 23 Feb 2019, 14:11:55 UTC - in response to Message 38058.  

Jonathan, thanks for the hints. I'll look into all this.
ID: 38061 · Report as offensive     Reply Quote
Guiri-One[Andalucia]

Send message
Joined: 1 Feb 06
Posts: 66
Credit: 9,723
RAC: 0
Message 38105 - Posted: 5 Mar 2019, 11:33:06 UTC - in response to Message 38061.  
Last modified: 5 Mar 2019, 11:35:47 UTC

And..once again, tons of wasted CPU:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=216849465

Why is this failing now? No credits? no science?
ID: 38105 · Report as offensive     Reply Quote

Message boards : Theory Application : high incidence of "missing heartbeat" lately


©2024 CERN