Message boards : Sixtrack Application : exceeded elapsed time limit 30940.80 (180000000.00G/5817.56G)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 30989 - Posted: 23 Jun 2017, 23:12:06 UTC - in response to Message 30984.  

Thanks a lot; the problems are identified but sadly
we never get them fixed|! Eric.
(I am not even sure that the DELETE WUs worked....)
ID: 30989 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 30990 - Posted: 23 Jun 2017, 23:15:12 UTC - in response to Message 30985.  

Thanks a lot, but I dare not change the Validator.
Mark you, if we don't get some fixes soon I\ll
probably do that and to hang. Probably couldn't be
much worse, and I could always undo.
I'll sleep on it. Eric.
ID: 30990 · Report as offensive     Reply Quote
Profile planetclown

Send message
Joined: 28 Mar 12
Posts: 2
Credit: 374,110
RAC: 0
Message 31019 - Posted: 24 Jun 2017, 12:38:05 UTC

Just a heads-up that I've noticed 34 of my tasks with errors containing a similar message.

https://lhcathome.cern.ch/lhcathome/results.php?userid=233918&offset=20&show_names=0&state=6&appid=

Most have "exceeded elapsed time limit 12140.55 (180000000.00G/14826.35G)" but a few have smaller numbers such as "time limit 7987.99 (180000000.00G/19907.96G".

Kind of rough since they take 2-3 hours of computing time before they error.
ID: 31019 · Report as offensive     Reply Quote
xii5ku

Send message
Joined: 7 May 17
Posts: 10
Credit: 6,952,848
RAC: 0
Message 31024 - Posted: 24 Jun 2017, 15:59:36 UTC - in response to Message 31019.  
Last modified: 24 Jun 2017, 16:03:23 UTC

@planetclown,
you will know that this kind of trouble is ahead if newly downloaded, not yet started tasks are listed with an estimated time remaining of a few minutes or even less than a minute.

Here is what I do on my clients which are in this situation:

    *have "No new tasks" set while I am away
    *download tasks manually, i.e. "Allow new tasks" + "Update"
    *perhaps even suspend CPU activity while downloading
    *when downloads finished, set "No new tasks"
    *shut down client
    *check with "ps ax|grep boinc" that the client is really down
    *make a backup of client_state.xml
    *search and replace all occurrences of ".000000</rsc_fpops_bound>" by "000000</rsc_fpops_bound>" in client_state.xml
    *restart client, resume CPU activity


This is based on Crystal Pellet's and Juha's posts in this thread.

I also attempted to work around this by editing client_state/ app_version/ flops, but at least one attempt of doing so with some work having been downloaded earlier resulted in almost all tasks erroring right away. Perhaps app_version flops should only be edited while no sixtrack WUs are present on the client.

Good luck in the Formula Boinc sprint. :-)

ID: 31024 · Report as offensive     Reply Quote
Profile planetclown

Send message
Joined: 28 Mar 12
Posts: 2
Credit: 374,110
RAC: 0
Message 31026 - Posted: 24 Jun 2017, 17:46:20 UTC - in response to Message 31024.  

@xii5ku
Thanks for the workaround. I'm up to 42 errors at the time of this post. I've just updated my client_state.xml and will keep an eye on the # of errors going forward. Thanks again!
ID: 31026 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 31027 - Posted: 24 Jun 2017, 18:17:21 UTC - in response to Message 31026.  

Sadly, I don't think we are quite there yet. See my latest comment about
one out of two Tasks. All feedback appreciated but remember I am
SixTrack "only". Surely you don't have to go through all these
gymnastics for SixTrack, but I guess you are running a mixture
like Crystal Pellet. Eric.
ID: 31027 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 617
Credit: 385,299,732
RAC: 135,167
Message 31640 - Posted: 26 Jul 2017, 19:04:42 UTC

This came back for me.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=152246604

There is about 30 for this host.
ID: 31640 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 991
Credit: 6,426,616
RAC: 480
Message 31641 - Posted: 26 Jul 2017, 19:38:30 UTC - in response to Message 31640.  

This came back for me.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=152246604

There is about 30 for this host.

Somehow your machine had reported a floating point operations of 13161.52, what's much too high. BOINC expects such a machine to be faster of course.

That machine is now reporting the fpops of 4446.94, what's more realistic.
ID: 31641 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 617
Credit: 385,299,732
RAC: 135,167
Message 31642 - Posted: 26 Jul 2017, 19:51:05 UTC

Thx CP, I re-ran the benchmarks to bring it back to a normal number.

Not sure how it got such a high number.
ID: 31642 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 31643 - Posted: 26 Jul 2017, 21:38:02 UTC - in response to Message 31641.  

This is a "known" problem, BUT many thanks for the solution.
What happens is that an "Intel family 6" with 4.8.* Linux, fails.
It returns a fort.10 / result however afte a short time and is
wrongly estimated to be very very fast. Subsequent tasks then
fail because they take "too long"! Thanks again for the solution
for the client but we need a fix at our end. Sadly I am not too
involved anymore but I'll pass the message. Eric.



This came back for me.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=152246604

There is about 30 for this host.

Somehow your machine had reported a floating point operations of 13161.52, what's much too high. BOINC expects such a machine to be faster of course.

That machine is now reporting the fpops of 4446.94, what's more realistic.

ID: 31643 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 852
Credit: 1,619,050
RAC: 0
Message 31785 - Posted: 2 Aug 2017, 6:02:24 UTC

A fix has been applied to the Validator to redefine outliers.
Seems to be working as I see the overall error rate has dropped
significantly. If you still the REAL_TIME_EXCEEDED please let me
know. If you have a hyper fast machine maybe it is enough to rerun
the benchmark on the BOINC client i.e. your side. Eric.
ID: 31785 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Sixtrack Application : exceeded elapsed time limit 30940.80 (180000000.00G/5817.56G)


©2021 CERN