Message boards : Number crunching : Tasks v530.09 crashing
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
m

Send message
Joined: 6 Sep 08
Posts: 117
Credit: 12,457,843
RAC: 3,387
Message 23383 - Posted: 6 Oct 2011, 20:35:36 UTC

My trusty Win2K laptop which has been running v530.08 tasks without problem has just started it's first .09 task which crashed, as did the next one, as here . The stderr shows exit code 168, which means nothing to me. I don't really want to move from BOINC v5 to v6 unless I have to, disk space is at a premium on this host and newer versions of things often seem to take more space. There are no errors reported in the Boinc Manager messages log. Any suggestions gratefully received.

John.
ID: 23383 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23384 - Posted: 6 Oct 2011, 22:09:22 UTC

As I see your other computers are returning 530.09 OK lets assume it is the laptop only.

I also see the wingmen on those tasks completed ok, vaiting valaidation so that kind of eliminates bad work units.

Also to find out what error codes mean, you can look at the result detail once it is returned to the project. then you can go to the unofficial boinc wiki and search for error code, there is a list of error codes that sometimes gives more detail on a certain error.

-168 is ERR_FTOK

I don't know what that means, search the net I found

ERR_FTOK -168

BOINC cannot get file token (key) for semaphores.

I still don't know this one.

Have you restarted boinc or windows since the errors ?

Always try the simple things first.

If so and those don't help, then try a project reset on the laptop, quite possibly the download of a new app went wrong. A reset will download again and that may have been the problem.
ID: 23384 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 23385 - Posted: 6 Oct 2011, 22:44:13 UTC - in response to Message 23384.  

Hold on, "Exit code 168" and "Exit code -168" are two different beasts.
BOINC errors are negative errors. Positive errors are science application errors.

The error here is "exit code 168 (0xa8)", thus a science app error.
The whole error is:
<stderr_txt>
forrtl: severe (168): Program Exception - illegal instruction
Image PC Routine Line Source
sixtrack_530.9_wi 00406C83 Unknown Unknown Unknown
sixtrack_530.9_wi 0040101F Unknown Unknown Unknown
sixtrack_530.9_wi 00657D33 Unknown Unknown Unknown
sixtrack_530.9_wi 006344EA Unknown Unknown Unknown
KERNEL32.dll 7C5989D5 Unknown Unknown Unknown

</stderr_txt>

I then looked up the "forrtl: severe (168)" part of the error and came to this thread on the Intel boards, where it says: "Turns out that the problem was caused by an older-generation processor not understanding newer instructions. The application had been compiled with the "Generate most optimized code" (/fast) setting, which implies /arch:host."

Probably something similar has happened here.
The "Genuine Intel x86 Family 6 Model 8 Stepping 3 746MHz", is a what? A Pentium III, an P2/3 model Celeron or a Pentium II Xeon? The 530.09 science application is probably compiled with an instruction set that these old CPUs do not understand. Hence the errors.
Jord

BOINC FAQ Service
ID: 23385 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 117
Credit: 12,457,843
RAC: 3,387
Message 23386 - Posted: 6 Oct 2011, 23:03:47 UTC - in response to Message 23385.  

Ageless.

You got it in one, as they say. I'll bow out gracefully at this point. Thanks both. Goodbye.

John.
ID: 23386 · Report as offensive     Reply Quote
Profile Igor Zacharov
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 16 May 11
Posts: 79
Credit: 111,419
RAC: 0
Message 23387 - Posted: 6 Oct 2011, 23:43:40 UTC - in response to Message 23385.  

Apparently, we should have kept the version 530.8 for older processors.
It is still possible, I have not removed them.

What would be the architecture designation for distinguishing the old and the new?
skype id: igor-zacharov
ID: 23387 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23389 - Posted: 7 Oct 2011, 7:04:26 UTC - in response to Message 23387.  

Will results from the 530.8 app verify with results from 530.9? There aren't many CPUs that old. Are there enough to justify maintaining 2 versions of the app? You've proposed shortening the deadline, will CPUs that old be able to meet the new deadline?

ID: 23389 · Report as offensive     Reply Quote
Profile Igor Zacharov
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 16 May 11
Posts: 79
Credit: 111,419
RAC: 0
Message 23391 - Posted: 7 Oct 2011, 7:46:51 UTC - in response to Message 23389.  

yes, the 530.9 and 530.8 deliver identical results (within the model, where
we look for last bit differences). The 530.9 can be factor of 2 faster, but
not always - it does not optimize away calculations, it (the compiler in fact)
just organizes them better by using the pipelining and special instructions.

Yes, there is no problem for us to keep multiple version of the same.
I just need to find out how to call the architecture to which the older
cpus belong. This will allow for automatic selection of the executable.

Shortening the deadlines is only a discussion item at this time. I still need
to assess what will have the largest inpackt on the efficiency of calculations.
skype id: igor-zacharov
ID: 23391 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 117
Credit: 12,457,843
RAC: 3,387
Message 23392 - Posted: 7 Oct 2011, 10:04:09 UTC - in response to Message 23391.  
Last modified: 7 Oct 2011, 10:07:17 UTC

Having started all this, I feel somewhat guilty, so here goes:-


Yes, there is no problem for us to keep multiple version of the same.
I just need to find out how to call the architecture to which the older
cpus belong. This will allow for automatic selection of the executable.


The relevant information is probably in clent_state.xml. Oddly the "features" line here shows the same flags for the "rogue" machine (Pentium 3?) as for a Pentium 4 which works. I have a vague recollection that Boinc was able to send different apps to different hosts.


Shortening the deadlines is only a discussion item at this time. I still need
to assess what will have the largest inpackt on the efficiency of calculations.


For those volunteers who don't run "farms" or use their employers' machines, I think the largest factor in work throughput is probably the time that the machine is actually running rather than it's speed. Energy use is the factor here.

I'm sure that Boinc makes an effort to predict whether work will finish within the deadline, and will not download if work won't complete. It certainly used to. This would automatically remove work from hosts that were just too slow, even if availability rather than processor speed was the cause. The server even sent a message to that effect.

Having said all that, given that the purpose of Boinc is to get the work done rather than to keep the likes of me happy so, although a quick search through BoincStats finds thousands of "Family 6" cpus I fear that many are not still active and am not sure that they justify much work to accommodate. Sadly.

John.
ID: 23392 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 23393 - Posted: 7 Oct 2011, 10:49:02 UTC - in response to Message 23387.  
Last modified: 7 Oct 2011, 10:52:26 UTC

Apparently, we should have kept the version 530.8 for older processors.
It is still possible, I have not removed them.

What would be the architecture designation for distinguishing the old and the new?

You always increment version numbers, so re-releasing 530.8 as 530.91 would be the next logical choice.

If you want to designate these to specific CPUs only, you'll need HR type 1, or to set up application plan classes.

Edit: But you can also ask yourself, is it worth it? Does this project have that many old CPUs attached? You can check that in the database. Or do you just not want to set a minimum CPU/minimum OS/minimum BOINC version as requirement? All questions you may answer for yourself. :-)
Jord

BOINC FAQ Service
ID: 23393 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,866,264
RAC: 0
Message 23394 - Posted: 7 Oct 2011, 13:04:02 UTC

I trashed a few WUs with this error and did a full un- and re-install of Boinc just in case something had become corrupted on an Athlon XP 2400. Surely that's not too old and slow. If anything, I think it's faster than the lappy that does most of my WUs.
ID: 23394 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 23395 - Posted: 7 Oct 2011, 13:16:45 UTC - in response to Message 23394.  

Surely that's not too old and slow.

Both the Pentium III and the Athlon XP 2400+ can do SSE, but not SSE2 or above. So, if the 530.9 science application was compiled to use the SSE2 instruction set, whereas the 530.8 version was compiled to only use the SSE instruction set, then that's what causing these errors.
Jord

BOINC FAQ Service
ID: 23395 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 23397 - Posted: 7 Oct 2011, 13:57:17 UTC - in response to Message 23393.  

You always increment version numbers, so re-releasing 530.8 as 530.91 would be the next logical choice.

Ugh, it was 530.08 and 530.09, so the next logical choice is 530.10 ;-)
Jord

BOINC FAQ Service
ID: 23397 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,866,264
RAC: 0
Message 23398 - Posted: 7 Oct 2011, 14:33:39 UTC - in response to Message 23395.  

Thanks, Jord.
Been looking for an excuse to upgrade anyway; single core and the PSU fan's a bit noisey. I'll maybe hold off going out to buy in case they can find a way to distribute WUs according to CPU architecture.
Waiting developments.
Lappy crunching away happily at LHC 1 and T4T work
ID: 23398 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 117
Credit: 12,457,843
RAC: 3,387
Message 23399 - Posted: 7 Oct 2011, 16:04:59 UTC

Thanks Ageless, I'm sure you're on the right lines, but this host reports sse not sse2 and is running 530.09 as I write. Igor wrote here that 530.09 uses sse3. Now I'm really confused.

John.
ID: 23399 · Report as offensive     Reply Quote
Profile Ageless
Avatar

Send message
Joined: 18 Sep 04
Posts: 143
Credit: 27,645
RAC: 0
Message 23400 - Posted: 7 Oct 2011, 16:29:49 UTC - in response to Message 23399.  

The Pentium 4 uses MMX, SSE and SSE2, as long as the operating system knows about these instruction sets as well. Windows 2000 does not support SSE2 or any instruction set thereafter. It only supports up to SSE.

I gave the instruction sets of SSE and SSE2 as examples, I hadn't checked all of Igor's posts to see what they were actually using. But at least that explains things further.

To be able to use SSE3, one needs a Pentium 4 Prescott CPU or better, or an Athlon 64 or better and Windows XP or better. SSE3 is also known as PNI (Prescott New Instructions).
Jord

BOINC FAQ Service
ID: 23400 · Report as offensive     Reply Quote
Profile trigggl
Avatar

Send message
Joined: 17 Feb 09
Posts: 22
Credit: 311,184
RAC: 0
Message 23401 - Posted: 7 Oct 2011, 19:48:52 UTC

I recycled (took it to the recycling center) my last Pentium 3 earlier this year. It's just wasn't worth the electricity.
ID: 23401 · Report as offensive     Reply Quote
Profile trigggl
Avatar

Send message
Joined: 17 Feb 09
Posts: 22
Credit: 311,184
RAC: 0
Message 23402 - Posted: 7 Oct 2011, 19:53:25 UTC - in response to Message 23399.  
Last modified: 7 Oct 2011, 20:01:00 UTC

Thanks Ageless, I'm sure you're on the right lines, but this host reports sse not sse2 and is running 530.09 as I write. Igor wrote here that 530.09 uses sse3. Now I'm really confused.


Different Arch's have different apps, same version. 64-bit processors support sse3. I requested sse3 support for Linux 64-bit. I don't know if the 32-bit apps were compiled that way.

Applications:
Microsoft Windows (98 or later) running on an Intel x86-compatible CPU 	530.09 	4 Oct 2011 15:31:31 UTC
Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU 	530.09 	4 Oct 2011 15:31:31 UTC
Linux running on an Intel x86-compatible CPU 	530.09 	4 Oct 2011 15:31:31 UTC
Linux running on an AMD x86_64 or Intel EM64T CPU 	530.09 	4 Oct 2011 15:31:31 UTC
ID: 23402 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 117
Credit: 12,457,843
RAC: 3,387
Message 23405 - Posted: 7 Oct 2011, 20:14:47 UTC - in response to Message 23400.  

The Pentium 4 uses MMX, SSE and SSE2, as long as the operating system knows about these instruction sets as well. Windows 2000 does not support SSE2 or any instruction set thereafter. It only supports up to SSE.

I gave the instruction sets of SSE and SSE2 as examples, I hadn't checked all of Igor's posts to see what they were actually using. But at least that explains things further.


Well, it confuses me. I've got a host running Windows 2000, so presumably no sse2 never mind sse3 and its 10% through a 530.09 task, with no problems that I can see.

To be able to use SSE3, one needs a Pentium 4 Prescott CPU or better, or an Athlon 64 or better and Windows XP or better. SSE3 is also known as PNI (Prescott New Instructions).


John.



ID: 23405 · Report as offensive     Reply Quote
Profile jujube

Send message
Joined: 25 Jan 11
Posts: 179
Credit: 83,858
RAC: 0
Message 23406 - Posted: 7 Oct 2011, 21:42:50 UTC - in response to Message 23402.  

I don't know if the 32-bit apps were compiled that way.


The app sent to my Linux 64 bit machine is 32 bit. Run the file command against it and see. Don't know if the Windows app for 64 bit arch is 32 or 64.

Applications:
Microsoft Windows (98 or later) running on an Intel x86-compatible CPU 	530.09 	4 Oct 2011 15:31:31 UTC
Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU 	530.09 	4 Oct 2011 15:31:31 UTC
Linux running on an Intel x86-compatible CPU 	530.09 	4 Oct 2011 15:31:31 UTC
Linux running on an AMD x86_64 or Intel EM64T CPU 	530.09 	4 Oct 2011 15:31:31 UTC


The above means they support those archs but doesn't necessarily mean they have a 64 bit app for 64 bit arch.
ID: 23406 · Report as offensive     Reply Quote
Profile Krunchin-Keith [USA]
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 2 Sep 04
Posts: 209
Credit: 1,482,496
RAC: 0
Message 23408 - Posted: 7 Oct 2011, 21:52:44 UTC - in response to Message 23406.  

I don't know if the 32-bit apps were compiled that way.


The app sent to my Linux 64 bit machine is 32 bit. Run the file command against it and see. Don't know if the Windows app for 64 bit arch is 32 or 64.


In windows 7 x64 task manager, a lot of programs have *32 next to them, including sixtrack, i assume that means 32bit.


Applications:
Microsoft Windows (98 or later) running on an Intel x86-compatible CPU 	530.09 	4 Oct 2011 15:31:31 UTC
Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU 	530.09 	4 Oct 2011 15:31:31 UTC
Linux running on an Intel x86-compatible CPU 	530.09 	4 Oct 2011 15:31:31 UTC
Linux running on an AMD x86_64 or Intel EM64T CPU 	530.09 	4 Oct 2011 15:31:31 UTC


The above means they support those archs but doesn't necessarily mean they have a 64 bit app for 64 bit arch.

ID: 23408 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Tasks v530.09 crashing


©2024 CERN