Message boards :
Number crunching :
Tasks v530.09 crashing
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 21 Aug 05 Posts: 14 Credit: 119,137 RAC: 0 |
Hi, Got two computers that cannot run LHC anymore because of ERR 168 of new version, -1-, -2-. Of course you may choose to run application only on new quantum computers or to use a larger part of computers in the world. As I can see in applications page : "Microsoft Windows (98 or later) running on an Intel x86-compatible CPU", you choose to run on W98 at least, meaning that you want old computers too. So I think you just have to recompile without this optimization set in order to solve problem or to use computer check's for task distribution. Regards |
Send message Joined: 16 May 11 Posts: 79 Credit: 111,419 RAC: 0 |
we don't have much architectural choices when specifying which app version to run. I have now retracted 530.9 (deleted) for all generic x86 Windows and Linux, leaving 530.9 specifically only for platforms which report with AMD_x86_64 and Intel EM64T processors back to the server. Please, check if that works for you. skype id: igor-zacharov |
Send message Joined: 3 Oct 06 Posts: 101 Credit: 8,994,586 RAC: 0 |
Yes, 530.9 disappeared and now I see v0.00 instead of it. On tasks pages. :-) That is all. |
Send message Joined: 21 Aug 05 Posts: 14 Credit: 119,137 RAC: 0 |
Well, I will check as soon as I will have some works - project has no jobs available - I just hope that you didn't stop the distribution for all x86 computers, because my third computer hadn't the same problem that those two older. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
My Opteron 1210 is SSE3 capable and has completed a v0.00 task waiting for validation. CPU time 42k s, run time 91k s. But I am running other 5 BOINC projects, including Test4Theory@home.More exactly, BOINC_VM running one CERN job after another 24/7. Tullio |
Send message Joined: 2 Sep 04 Posts: 209 Credit: 1,482,496 RAC: 0 |
we don't have much architectural choices when specifying which app version to run. I guess it is working. My x64's have no work, but the last ones done were 530.09 My x32 are showing v0.00 I also see "Database Error" appear now on a lot of the website task pages when viewing results. Some pages say it twice. example |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,552,221 RAC: 4,726 |
It's not working for me, I'm afraid. The original host has just been sent a 530.09 task, which crashed... John. |
Send message Joined: 2 Sep 04 Posts: 209 Credit: 1,482,496 RAC: 0 |
Yeah something is not working. I checked one of the x32 hosts, it shows 530.09 in boincmanage as running. The task shows a v0.00 on the website. I also looked into the lhc folder and slots for the task, there is only a 530.9 application. on the website also tasks show as v530.08 and then v0.00 for the x32 hosts, no v530.09s appear in the task lists list, but for the x64 hosts it only shows v530.09 as run. Very odd ? |
Send message Joined: 16 May 11 Posts: 79 Credit: 111,419 RAC: 0 |
After consultation with Eric McIntosh, we desided to retrack completely the 530.9 version. I have reinstalled the 530.8 version, now called 530.10. We will come back to it after a better investigation. Have to admit a mistake. skype id: igor-zacharov |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,552,221 RAC: 4,726 |
All running well again. Thanks. John. |
Send message Joined: 21 Aug 05 Posts: 14 Credit: 119,137 RAC: 0 |
Version 530.10 seems to be fine - no crash at startup - Thanks for update |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
MEA CULPA. Having concentrated so much on the floating-point model options for the ifort compiler I rather forgot the basic PC architecture. I was anxious to get a two to four times faster version into production so as to maximise the use of your systems. I just removed the arch IA32 flags to allow use of SSE2 (which is floating-point compatible for me) and thus generated an executable for the very modern Linux and Windows PCs in my office. NOT a good idea as we see from your messages. So Igor has put us back to my Version 4308 or BOINC 530.8/530.10. We shall get this sorted out as soon as possible and have different executables for different platforms so as to optimise resource utilisation. (We have also increased, probably by too much, the fpops, disk space, and elapsed time estimates.) This will hopefully give us some breathing space as it is now absolutely vital that I check the physics results of all these recent studies. The problem is basically that I had to switch to the Intel IFORT compiler with the new BOINC version. With the appopriate fp-model flags this worked really well, until I found a small number of 1 ULP differences on the formatted input of the accelerator description. This difference appears between the Linux and Windows executables even on the same hardware apparently. The problem of formatted input is well understood and was largely solved by David. M Gay some twenty years ago, and is I believe handled correctly by C99. As a (temporary) solution I now read the data as Single Precision. The recent studies, including those very short runs, not even one turn, will allow me to evaluate the physics impact of this change. If the effect is too large I shall have to replace the Fortran formatted input IFORT runtime routine with a correct C strtod........sigh. This would be useful on the longer term as it would hopefully allow the use of other compilers but still producing identical results. Again all this should be a non-issue when compilers with the new Fortran 2003 Formatted I/O ROUND options become available. Thanks for your understanding. Eric. |
Send message Joined: 3 Oct 06 Posts: 101 Credit: 8,994,586 RAC: 0 |
Version 530.10 seems to be fine... I do not agree - new and never seen before problems started. ;-) The new problem is Error 148: <message> couldn't start CreateProcess() failed - : -148 </message> Recommendations are very welcome... |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Thanks I'll get right onto it. Probably the resource limits are TOO big now. Eric. |
Send message Joined: 18 Sep 04 Posts: 143 Credit: 27,645 RAC: 0 |
530.09 <rsc_fpops_est> is <rsc_fpops_est>30000000000000.000000</rsc_fpops_est> 530.10 <rsc_fpops_est> is <rsc_fpops_est>120000000000000.000000</rsc_fpops_est> Is there a reason why you increased the estimated run time by 4 times the original value? These tasks run for approximately 8 hours on my i3-530, but they're estimated to run a whole lot more, thereby making these task runs in panic mode (high priority). Just because you went back to a lower form of instruction set does not mean everything will run that whole lot slower. ;-) (With all those tricks, the LHC Classic DCF on my system is now completely haywire. Where the other projects that run have their DCF around 1.0, LHC has it at 2.9; I may consider resetting the project, so my DCF will reset to 1.0). Jord BOINC FAQ Service |
Send message Joined: 17 Feb 09 Posts: 22 Credit: 311,184 RAC: 0 |
After consultation with Eric McIntosh, we desided to retrack completely the So, we're back to horrible run times and credits for Linux? I'll wait for the return of an improved .09. How about using .09 (.11) for x86_64 and .10 for i686? |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
I need to discuss with Igor. We have been seeing complaints about resource limits exceeded but we are not sure which. Seems to me I should put back the fpops to the previous value. However the current ifort version is actually rather slow until I get round to optimisimg it. Sorry about that. In the past we could test changes "in house" but not at the moment. Thanks for the feedback. Eric. |
Send message Joined: 30 Sep 04 Posts: 21 Credit: 1,442,034 RAC: 0 |
530.10 is much less efficient than 530.09 on my core2quad. It's 15-33% slower and lower core temps also indicates less power is used by proc. I had no problems with 530.09 windows xp on my two o/c computers (so far only one invalid and that was my fault with some manual intervention in boinc manager). Perhaps other guys o/c too much or have some other hardware problems with computer. |
Send message Joined: 18 Sep 04 Posts: 143 Credit: 27,645 RAC: 0 |
However the current ifort version is actually rather slow until I get round to optimisimg it. I'd like to disagree. :-) Two of my last 530.09s: CPU time 23271.08 seconds. CPU time 27747.53 seconds. This versus two 530.10s that have finished: CPU time 11122.58 seconds. CPU time 10511.39 seconds. If you manage to decrease my run time by half to a third without optimizations, then please don't optimize the applications any further. ;-) (I fully understand that not all tasks are the same length in run time around here.) In the past we could test changes "in house" but not at the moment. Well, you have us. You can send that work either to hosts you trust, or have a small group of us do some alpha work with feedback. Jord BOINC FAQ Service |
Send message Joined: 17 Feb 09 Posts: 22 Credit: 311,184 RAC: 0 |
Two of my last 530.09s: So, perhaps 530.09 slowed Windows down rather than speeding Linux up? Well, either way, if I'm validating against a Windows computer with 530.10, I'm at a great disadvantage. |
©2024 CERN