Message boards :
Number crunching :
Invalid tasks
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Thanks Matthias; if I don't get this fixed by Monday I'll try and get the temporary fix using homogeneous feature. Eric. |
Send message Joined: 6 Jul 06 Posts: 108 Credit: 663,175 RAC: 0 |
Thanks Matthias; if I don't get this fixed by Monday I'll I hope that works as I have now had 5 fail due to the Windows/Linux validation issue, and 4 of them were the 30,000+ second jobs, so, many hours and over 1,000 points lost. Conan |
Send message Joined: 29 Sep 04 Posts: 187 Credit: 705,487 RAC: 0 |
The lost time has resulted in my setting no new tasks. If they know the issue is there, why are they still sending jobs to Windows machines. My machines could have been doing useful work for someone else. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 26 Sep 11 Posts: 37 Credit: 7,807,848 RAC: 2 |
I don't have statistics to back this up, but my impression is that it is longer-running tasks on Linux that fail to validate against Windows. I have the impression that short runs may be OK, but any tasks that take more than a few hours on my i7-3770 seem to end up as invalidated if they go up against wingmen running Windows. They validate OK if my wingman is also running Linux. I am particularly bummed out because I have an Intel Atom powered netbook that was crunching away for more than 2 days on a task. And the wingman ran Windows so it invalidated. Would be good if this good be rectified. |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,588,679 RAC: 157 |
I haven't noticed this problem, this task took ca 4 days on a Linux box and it validated OK against a much faster Windows host. John. |
Send message Joined: 26 Sep 11 Posts: 37 Credit: 7,807,848 RAC: 2 |
I suspect it's more of a trend or tendency, than a fixed pattern or rule. I didn't analyze all my results, and we can't go back enough far in time anyway. Like you, I suspect I may have long jobs with Windows wingmen that were validated correctly. However, I do notice that all my recent invalidated results seem to be long jobs with Windows wingmen. That's why it's only an impression or hunch that I wanted to share for now. P.S. Hat's off for keeping your Pentium III crunching. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
The problem is now identified as being with the Windows executables on many, but not all, tasks of a specific study involving tests of power supply ripple. The fix, using a different ifort compiler is available, and if can't be installed pronto, I am hoping to temporarily use homogeneous redundancy, as a temporary fix to avoid invalid results (but which will not fix the physics). I am really really sorry about the long delay but the nasty problem is creating Windows executables which don't give "cannot create task" or syntax problems with the PC description. More news soonest. Eric. |
Send message Joined: 17 Jul 05 Posts: 102 Credit: 542,016 RAC: 0 |
Here we have quite a strange invalid result : wuid=16556468 One of the first two delivered workunits has been returned after the deadline, so the server side scheduler decided to send out some more. The second one (in time Linux) didn't validate against the third and fourth (both in time Win x64), but when the delayed first result (Win x64) came back, it validated against the Linux result. |
Send message Joined: 17 Jul 05 Posts: 102 Credit: 542,016 RAC: 0 |
2 inconclusive ones : with SixTrack v451.07 wtest_newnuebb0105__5__s__64.31_59.32__6_8__5__30_1_sixvf_boinc610 waiting for a third result with SixTrack v451.07 w14_eric_job_tracking_bb_np_nt_fset_240214__13__s__62.31_60.32__10_12__6__82.5_1_sixvf_boinc4540 invalid In both cases the runtime on my box has been extremely low. Unusual : All boxes ran windows and for some reason mine always picked SSE3, where the others picked PNI ... but otoh., I patched my clients to report SSE3 (5.10.28 didn't know that extension yet) |
Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0 |
Unusual : All boxes ran windows and for some reason mine always picked SSE3, where the others picked PNI ... but otoh., I patched my clients to report SSE3 (5.10.28 didn't know that extension yet) Usual. 'SSE3' and 'Prescott New Instructions' are synonyms, and the applications are identical. |
Send message Joined: 17 Jul 05 Posts: 102 Credit: 542,016 RAC: 0 |
Unusual : All boxes ran windows and for some reason mine always picked SSE3, where the others picked PNI ... but otoh., I patched my clients to report SSE3 (5.10.28 didn't know that extension yet) Yes, I already learned that here - it was just surprising that mine always picked the sse3, whereas others picked pni ... until I remembered my core client patch |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
[quote]2 inconclusive ones : with SixTrack v451.07 wtest_newnuebb0105__5__s__64.31_59.32__6_8__5__30_1_sixvf_boinc610 waiting for a third result Looks OK now I think with SixTrack v451.07 w14_eric_job_tracking_bb_np_nt_fset_240214__13__s__62.31_60.32__10_12__6__82.5_1_sixvf_boinc4540 invalid This was a glitch on our side giving No such file or directory...... Thanks. Eric. |
Send message Joined: 28 Aug 12 Posts: 15 Credit: 500,336 RAC: 0 |
Hi, I have a wingman with 888 WU´s http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=10137504 All of them are done within a second while my machine works since 15 hours on two of them. As of now there are ~490 validation pending and ~400 validation inconclusive. None is validated. I think this machine does not like SixTrack v451.07 (sse2) at all. Or something else. |
Send message Joined: 22 Nov 10 Posts: 5 Credit: 778,394 RAC: 0 |
I'm getting some errors too: http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=17279643 http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=17279642 http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=17279637 |
Send message Joined: 22 Nov 10 Posts: 5 Credit: 778,394 RAC: 0 |
I don't like this. . . 36856057 17279644 18 May 2014, 1:33:48 UTC 18 May 2014, 8:15:38 UTC Error while computing 10,104.09 7,861.61 --- SixTrack v451.07 (pni) 36856055 17279643 18 May 2014, 1:33:48 UTC 18 May 2014, 4:44:45 UTC Error while computing 10,104.57 7,076.31 --- SixTrack v451.07 (pni) 36856054 17279642 18 May 2014, 1:33:48 UTC 18 May 2014, 4:29:28 UTC Error while computing 10,105.04 8,444.38 --- SixTrack v451.07 (pni) 36856049 17279640 18 May 2014, 1:33:48 UTC 18 May 2014, 8:15:38 UTC Error while computing 10,104.29 5,325.66 --- SixTrack v451.07 (pni) 36856043 17279637 18 May 2014, 1:33:48 UTC 18 May 2014, 5:22:37 UTC Error while computing 10,104.04 8,278.92 --- SixTrack v451.07 (pni) 36856035 17279633 18 May 2014, 1:33:48 UTC 18 May 2014, 4:29:28 UTC Error while computing 10,104.18 8,430.53 --- SixTrack v451.07 (pni) 36856011 17279621 18 May 2014, 1:33:48 UTC 18 May 2014, 4:32:30 UTC Error while computing 10,103.88 7,055.89 --- SixTrack v451.07 (pni) |
Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0 |
This is the 'EXIT_TIME_LIMIT_EXCEEDED' error we were discussing yesterday in the thread of the same name. Host 10308609 has an APR of 178.17200653414 for the 64-bit PNI app, version 451.07 (production). Qax, this is a problem on the server, not on your computer. Eric is aware of it. Don't adjust your settings, but you might prefer to concentrate on another project for a few hours. Eric, we probably need that high rsc_fpops_bound multiplier on the production tasks for a while at least. Do we have outlier detection in the current validator? If not, we need it, or this will keep happening. Once the server has been 'vaccinated' against outliers, you could try running 'reset credit statistics for this application' from Estimating job resource requirements. |
Send message Joined: 22 Nov 10 Posts: 5 Credit: 778,394 RAC: 0 |
Should I cancel all the WUs I downloaded, or.....just suspend the project? |
Send message Joined: 27 Oct 07 Posts: 186 Credit: 3,297,640 RAC: 0 |
Should I cancel all the WUs I downloaded, or.....just suspend the project? Up to you. As things stand, any task which runs longer than three hours on your machine will be killed: you can cure that (gradually) by running shorter tasks, but catch-22 says that you can't know which tasks are going to be short (that's what the project is here to discover) - except perhaps by waiting until your 'wingmate' has completed and reported their copy of the task. But that's hard work. I see you've run many BOINC projects, for many years. How much have you learned about how it works under the hood? Does the phrase "edit client_state.xml" fill you with dread? There are ways of solving this problem locally, but they require knowledge and care. If you know enough about editing client_state to be worried by it, then you're probably in the right place to learn some more: but if you've never come across it before, then I think I'd advise against. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Thanks again; I am treating this as top priority now. Trying to get to grips with it all. The error rate is reaching 3% which is too high.................Eric. |
Send message Joined: 22 Nov 10 Posts: 5 Credit: 778,394 RAC: 0 |
I started running SETI in 2000. For some reason, I can't connect my old classical account to my new one. I'm thinking I must have changed the e-mail at some point towards the end, and now I can't remember it. I try not to get too involved in "hacks" for these things. For the longest time I wouldn't run them on an overclocked machine, because I was always afraid that might compromise the data. And to me, the integrity of the results if the most important thing. Right now I am running WUs on my server. Last I checked, they were working fine. But I'm having a very low success rate on my home PC, so....I suspended the project for now. However, I have like at least a dozen in the chamber ready to go, and I actually have like 4 or 5 already over 3 hours long. So I guess I should just kill those then? BTW - It's not that I'm not curious about how things work. I run the program on a linux server via command line more than 2000 miles away from me. But as far as tweaking things to run not as the programmers intended always makes me worry about changing the results in some way. |
©2025 CERN