Message boards : Number crunching : Time limit errors????
Message board moderation

To post messages, you must log in.

AuthorMessage
Tex1954

Send message
Joined: 24 Apr 11
Posts: 37
Credit: 1,295,012
RAC: 0
Message 27299 - Posted: 7 Apr 2015, 2:02:58 UTC
Last modified: 7 Apr 2015, 2:03:18 UTC

Howdy!

Getting way too many of this lately:

http://lhcathomeclassic.cern.ch/sixtrack/results.php?hostid=10356474&offset=0&show_names=0&state=6&appid=


<core_client_version>7.4.42</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 5738.95 (1800000.00G/313.65G)
</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x76873226


Is it my setup or a bug or what? Lot of wasted time there...

Thanks!

9-)
ID: 27299 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27300 - Posted: 7 Apr 2015, 4:01:27 UTC - in response to Message 27299.  

Right, apologies. These are my famous tests "wzero".
Should all be out of the way soon I hope, Eric.
ID: 27300 · Report as offensive     Reply Quote
Tex1954

Send message
Joined: 24 Apr 11
Posts: 37
Credit: 1,295,012
RAC: 0
Message 27304 - Posted: 7 Apr 2015, 5:15:55 UTC - in response to Message 27300.  
Last modified: 7 Apr 2015, 5:20:50 UTC

Right, apologies. These are my famous tests "wzero".
Should all be out of the way soon I hope, Eric.


I don't mind if it's not the setups fault... just worried the hardware might be glitching..

I have a 4.3GHz 2600K setup I just dedicated to CERN stuff and worried a little... 2 WU's each...




Thanks!

9-)
ID: 27304 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27316 - Posted: 7 Apr 2015, 15:51:26 UTC - in response to Message 27304.  

That sounds great; I may have more news later. I am also very
interested in results from these modern very fast machines.
SixTrack is a great test apart from producing useful results.
Eric.
ID: 27316 · Report as offensive     Reply Quote
Phil
Avatar

Send message
Joined: 26 Jul 05
Posts: 63
Credit: 4,083,755
RAC: 0
Message 27320 - Posted: 7 Apr 2015, 19:46:37 UTC - in response to Message 27300.  

Right, apologies. These are my famous tests "wzero".
Should all be out of the way soon I hope, Eric.

I'm rather disappointed with the wzero_jbbtcm1's, my batch are all running perfectly!
ID: 27320 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,369,412
RAC: 10,065
Message 27322 - Posted: 7 Apr 2015, 20:35:48 UTC - in response to Message 27316.  

I am also very interested in results from these modern very fast machines.
Eric.


My Prozessor tells these Features:

Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2

short: sse sse2 pni ssse3

At the Moment, all running WUs of Sixtrack on this box are sse2. In the history I see that this box has only crunched WUs with sse2 or "nothing".

On the program-side is shown, that there are versions for sse3 and pni. So, is it okay, that only sse2 is sent to this box ?


Supporting BOINC, a great concept !
ID: 27322 · Report as offensive     Reply Quote
Ano

Send message
Joined: 29 Nov 09
Posts: 42
Credit: 229,229
RAC: 0
Message 27365 - Posted: 10 Apr 2015, 12:09:46 UTC
Last modified: 10 Apr 2015, 12:11:53 UTC

I used to receive pni and sse3 (the proof is in those buggy works I celebrate the birthdays of), but indeed, recently there are only sse2.
Maybe it's to better hunt bugs?

edit: I just realized you had asked it back in a dedicated topic.Sorry for posting here.
ID: 27365 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 281
Credit: 11,859,285
RAC: 1
Message 27418 - Posted: 6 May 2015, 21:37:05 UTC

All today's wsuper_ sixtracktest wus are showing a 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED error. All wingmen are showing the same error.
Initial time estimate for most is 11 mins although a few have been 30hours. They get to c.50% in 4mins then reset to 0% and an estimate of 90hours. They proceed to 10% in about 2 hours with time estimate reducing in large chunks but then exit with the time-exceeded error.
ID: 27418 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27419 - Posted: 7 May 2015, 8:31:47 UTC - in response to Message 27418.  

Thanks Ray; looking at that problem right now. Eric.
(Testing a new version which should return all result
files...... :-)
ID: 27419 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27420 - Posted: 7 May 2015, 8:33:08 UTC - in response to Message 27322.  

NO, this is NOT normal. Thanks for that valuable feedback. Eric.
ID: 27420 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27421 - Posted: 7 May 2015, 9:21:50 UTC

OK; a bug on my part. Jobs were running for one million turns
but were supposed to stop after 10,000 turns. The limits were
set for only 10,000 .That was the bug.
More tests coming. Eric.
ID: 27421 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 27422 - Posted: 7 May 2015, 9:31:28 UTC - in response to Message 27421.  

I have 'inoculated' task 67569958 by increasing <rsc_fpops_bound> by several orders of magnitude - so you should get at least one result from the first set.

On the other hand, it might be quicker to abort it and run a 10K turn task instead, or find the million-turn parameter and turn it down a bit. Up to you.
ID: 27422 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 27423 - Posted: 7 May 2015, 10:11:02 UTC - in response to Message 27422.  

Thanks Richard; don't waste time. That one result I can check
anyway, but corrected tests are on their way. I don't know how
to get rid of the bad tests, probably too late, but there are
not too many. Eric.
ID: 27423 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 27424 - Posted: 7 May 2015, 10:20:16 UTC - in response to Message 27423.  

OK, it hadn't even started, so I've aborted it - no time wasted (the machine needed a reboot for an AV update anyway).
ID: 27424 · Report as offensive     Reply Quote

Message boards : Number crunching : Time limit errors????


©2024 CERN