Message boards : Number crunching : Big problem: work units running with negative time.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 25781 - Posted: 6 Sep 2013, 16:07:02 UTC - in response to Message 25780.  
Last modified: 6 Sep 2013, 16:25:01 UTC

I am still running BOINC client 6.10.58 on my Linux box and have no problem.
Tullio
ID: 25781 · Report as offensive     Reply Quote
Tex1954

Send message
Joined: 24 Apr 11
Posts: 37
Credit: 1,295,012
RAC: 0
Message 25789 - Posted: 7 Sep 2013, 6:56:46 UTC - in response to Message 25744.  

I am getting a completely new experience with many of the new work units. They run to about 0.030% completion in 14 seconds on my computer, and then go back to zero %. The elapsed time in all those cases, as displayed in BOINC manager, also jumps back from around 14 to 4 seconds. Something I did not know was possible. In other words, elapsed time is jumping backwards.

I'm aborting those jobs now, because they just seem to be completely stuck without progress. Because it's my bed time I will suspend LHC for now. Please let me know if and when it is safe to resume LHC again.


I tried again on a 2600K Linux system and all 8 tasks jumped around then errored out again. Seems to run okay on my Windoz-7 systems, but not on linux at all.

http://lhcathomeclassic.cern.ch/sixtrack/results.php?hostid=10299948

:)
ID: 25789 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 25790 - Posted: 7 Sep 2013, 7:16:36 UTC - in response to Message 25789.  

This probably because you use a BOINC 7.0.65 client, like Igor hints. I am still using 6.10.58 and have no problem.
Tullio
ID: 25790 · Report as offensive     Reply Quote
pvh

Send message
Joined: 17 Jun 13
Posts: 8
Credit: 6,548,286
RAC: 0
Message 25791 - Posted: 7 Sep 2013, 7:39:42 UTC - in response to Message 25771.  

Thanks for all the (detailed) feedback. Looks like a Linux problem on
"some" Linux systems. We tested on basically RedHat 6.


I assume you mean RHEL 6 here? They tend to have very conservative software choices in their stack (meaning very old software). Most users will not be running that. May I suggest putting some more cutting edge Linux distros in your testbed (at least part of it)? Fedora core, Ubuntu, or openSUSE could be good choices... They would be closer to what your users are using.
ID: 25791 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 25792 - Posted: 7 Sep 2013, 8:35:50 UTC - in response to Message 25791.  

Good point; we actually have CERN SLC5 and SLC6, Scientific Linux CERN.
I thought conservative was good! :-). Sadly have NO budget but I guess I
could have a virtual machine on my MacBook pro with something more
up to date...It is just I have so much to with SixTrack itself and the
numerics I was hoping to avoid setting up my own computer centre! Eric.
ID: 25792 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 25793 - Posted: 7 Sep 2013, 8:43:37 UTC - in response to Message 25792.  
Last modified: 7 Sep 2013, 8:44:41 UTC

I suggest you to load Ubuntu, that is the Linux BOINC client is made up from. I installed a VirtualMachine with SL 5.3 on my Linux box using OpenSuSE 12.3 in the hope it would run the Berkeley 7.0.65 client, which my Linux does not accept. but it does not. So I am still using 6.10.58, no problem here, and using the old wrapper at Test4Theory@home, which can use both cores of my Opteron 1210, while the new wrapper uses only one.
Tullio
ID: 25793 · Report as offensive     Reply Quote
pvh

Send message
Joined: 17 Jun 13
Posts: 8
Credit: 6,548,286
RAC: 0
Message 25794 - Posted: 7 Sep 2013, 9:39:18 UTC - in response to Message 25793.  

I run BOINC 7.0.65 on openSUSE 12.3. It runs just fine. Did you install the correct libraries? You need these:

libwx_baseu-2_8-0-wxcontainer-2.8.12-17.1.1.x86_64
libwx_baseu_net-2_8-0-wxcontainer-2.8.12-17.1.1.x86_64
libwx_gtk2u_adv-2_8-0-wxcontainer-2.8.12-17.1.1.x86_64
libwx_gtk2u_core-2_8-0-wxcontainer-2.8.12-17.1.1.x86_64
libwx_gtk2u_html-2_8-0-wxcontainer-2.8.12-17.1.1.x86_64
wxWidgets-wxcontainer-compat-lib-config-2.8.12-17.1.1.x86_64
ID: 25794 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 25796 - Posted: 7 Sep 2013, 10:00:26 UTC - in response to Message 25794.  

Thanks, but I am satisfied with 6.10.58. It runs fine with all BOINC projects I am engaged besides LHC@home (SETI@home, Albert@home, Test4Theory@home, climateprediction.net, Einstein@home, SETI Astropulse). I am running the new LHC version with no problem and Test4Theory@home on 2 cores. My Linux is 32-bit pae and I am using 8 GB RAM on both PCs. I have also started using SSD disks both on the HP laptop and the SUN WS, they are much faster and produce less heat. Yes, they cost more but prices are decreasing.
Tullio
ID: 25796 · Report as offensive     Reply Quote
pvh

Send message
Joined: 17 Jun 13
Posts: 8
Credit: 6,548,286
RAC: 0
Message 25797 - Posted: 7 Sep 2013, 10:04:25 UTC

Yet get Test4Theory@home to work? I very quickly gave up on that when I noticed that around 50% of WUs were failing. This virtualbox system seems to be very flaky.
ID: 25797 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 25798 - Posted: 7 Sep 2013, 10:36:14 UTC - in response to Message 25797.  
Last modified: 7 Sep 2013, 10:37:54 UTC

I have been running it since November 2010 still in the Alpha phase. I was invited to join by dr. Ben Segal on the basis of my experience with VirtualBox. I had installed a VirtualMachine with Open Solaris on my Linux box and used it to run a BOINC client and a SETI@home app by a developer called Dotsch. Now T4T is running standalone on one core of AMD APU E-450 in my HP laptop and as a BOINC client on my SUN WS. Unfortunately, going fron SuSE Linux 12.2 to 12.3 my floating point benchmark fell to one half, for unknown reasons, and I am getting the half of the credits I got before. This on the SUN, the laptop is not giving me any credit, but I don't care. MCPLOTS shows its work.
Tullio
ID: 25798 · Report as offensive     Reply Quote
Tex1954

Send message
Joined: 24 Apr 11
Posts: 37
Credit: 1,295,012
RAC: 0
Message 25800 - Posted: 7 Sep 2013, 14:32:31 UTC - in response to Message 25781.  

I am still running BOINC client 6.10.58 on my Linux box and have no problem.
Tullio


Possibly a bug or library change in the newer BOINC clients? They all error out the same way with that heartbeat thing...

:)
ID: 25800 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 25801 - Posted: 7 Sep 2013, 15:01:06 UTC - in response to Message 25800.  

Probably the BOINC client is becoming too complicated and does not work on all Linux distros, being based on Ubuntu.This is not right. There was a talk of a Standard Linux but I never heard of it.
Tullio
ID: 25801 · Report as offensive     Reply Quote
Tex1954

Send message
Joined: 24 Apr 11
Posts: 37
Credit: 1,295,012
RAC: 0
Message 25802 - Posted: 7 Sep 2013, 17:52:54 UTC
Last modified: 7 Sep 2013, 17:53:11 UTC

Well, if one runs OpenCL with Nvidia (like me) under Linux, then newer 7.x.x BOINC clients are required...

Obviously there is some difference in how the Linux tasks are built... the Windoz tasks run fine.

I would guess we may need a version switch in the tasks to account for 6.x.x vs. 7.x.x or something like that...

:D
ID: 25802 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 7,704,455
RAC: 259
Message 25803 - Posted: 7 Sep 2013, 20:17:39 UTC - in response to Message 25794.  

I run BOINC 7.0.65 on openSUSE 12.3. It runs just fine. Did you install the correct libraries? You need these:

libwx_baseu-2_8-0-wxcontainer-2.8.12-17.1.1.x86_64
libwx_baseu_net-2_8-0-wxcontainer-2.8.12-17.1.1.x86_64
libwx_gtk2u_adv-2_8-0-wxcontainer-2.8.12-17.1.1.x86_64
libwx_gtk2u_core-2_8-0-wxcontainer-2.8.12-17.1.1.x86_64
libwx_gtk2u_html-2_8-0-wxcontainer-2.8.12-17.1.1.x86_64
wxWidgets-wxcontainer-compat-lib-config-2.8.12-17.1.1.x86_64


I tried again after installing some of the missing libraries listed above. It did not make a difference.

I then realised those libraries relate to the graphics widgets. This is presumably relevant for getting BOINC manager to display properly, but should not be relevant, I expect, for the BOINC tasks. I should also note that BOINC is fine on my machines, and continues to happily crunch away on Einstein, Rosetta, eON, and Test4Theory. It's only the recent LHC tasks, from the latest version, that crash out with these "lack of heartbeat" errors.

As I'm typing this I wonder if it has anything to do with DNS assumptions. I remember this being a problem when some settings were changed in Ubuntu in one of the recent version. It caused trouble for T4T as well, until a Virtualbox update solved the issue. Again, pure speculation on my part.
ID: 25803 · Report as offensive     Reply Quote
Tex1954

Send message
Joined: 24 Apr 11
Posts: 37
Credit: 1,295,012
RAC: 0
Message 25804 - Posted: 7 Sep 2013, 22:36:08 UTC - in response to Message 25803.  

I have all the correct and updated libraries... I think... pretty sure because everything else 32b/64b applications run fine. ONLY LHC is giving me problems.

As I think back, it seems this isn't the first time I've run across this problem... seems to me another Beta project had the same problem... if I could just remember..

8-)
ID: 25804 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 7,704,455
RAC: 259
Message 25805 - Posted: 8 Sep 2013, 0:09:58 UTC - in response to Message 25803.  

As I'm typing this I wonder if it has anything to do with DNS assumptions. I remember this being a problem when some settings were changed in Ubuntu in one of the recent version. It caused trouble for T4T as well, until a Virtualbox update solved the issue. Again, pure speculation on my part.


Thinking further about it, I remember the issues that cropped up with T4T were as a result of changes in how Ubuntu deal with resolv.conf

Info on that here:
http://www.stgraber.org/2012/02/24/dns-in-ubuntu-12-04/

Could that be the source of problems for Ubuntu users?
ID: 25805 · Report as offensive     Reply Quote
Tex1954

Send message
Joined: 24 Apr 11
Posts: 37
Credit: 1,295,012
RAC: 0
Message 25806 - Posted: 8 Sep 2013, 6:14:05 UTC - in response to Message 25781.  
Last modified: 8 Sep 2013, 6:16:57 UTC

I am still running BOINC client 6.10.58 on my Linux box and have no problem.
Tullio


Would be interesting to note if it was a Linux kernal/build problem or a specific BOINC client problem or a combination of both..

I run Linux MINT Cinnamon (latest) with client 7.0.65...

8-)
ID: 25806 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 25807 - Posted: 8 Sep 2013, 8:35:33 UTC - in response to Message 25802.  
Last modified: 8 Sep 2013, 8:35:58 UTC

There is such a switch at T4T. With BOINC 6.10.58 I get the old wrapper, which allows the use of 2 cores if enabled in BIOS. With BOINC 7.0.x people get the new wrapper, which makes a snapshot every ten minutes.
Tullio
ID: 25807 · Report as offensive     Reply Quote
Theadalus

Send message
Joined: 1 Aug 12
Posts: 1
Credit: 1,368,434
RAC: 0
Message 25812 - Posted: 9 Sep 2013, 2:05:24 UTC

Hi,

I have same problem with "endless-loop-wu's", machine is running Ubuntu 12.04(.2) x64 and BOINC Client 7.0.65. Other machines with older Ubuntu and BOINC version give no problems.
I ran almost a dozen other BOINC based projects on this machine (same OS and BOINC version) without any/this kind of problems, so to me it seems to be LHC issue...
ID: 25812 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 708
Credit: 4,336,250
RAC: 0
Message 25813 - Posted: 9 Sep 2013, 7:42:07 UTC

New software releases always have some problem. At Test4Theory@home the latest Virtual Box release (4.2.18) is giving problems and has been put in quarantine.
Tullio
ID: 25813 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Big problem: work units running with negative time.


©2024 CERN