log in

Very short Runtimes


Advanced search

Message boards : Sixtrack Application : Very short Runtimes

Author Message
computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,397,624
RAC: 3,695
Message 32242 - Posted: 5 Sep 2017, 10:29:54 UTC

Today each of my hosts got a bunch of sixtrack tasks.
Most of them have runtimes of only a few seconds.
Faulty or not?

See:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10486310
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10486393

Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 29 Feb 16
Posts: 32
Credit: 404,031
RAC: 1,200
Message 32278 - Posted: 5 Sep 2017, 16:52:12 UTC - in response to Message 32242.
Last modified: 5 Sep 2017, 16:52:30 UTC

these are tasks where a region of highly unstable phase space is scanned

You can guess on your own if the tasks is likely to be short or not from its name, e.g.: https://lhcathome.cern.ch/lhcathome/result.php?resultid=154383317 . The

__14_16__
means that a normalised amplitude between 14 and 16 sigma is scanned, which is very large wrt typical figures, where we expect some chaotic motion to arise at about 6sigma

computezrmle
Send message
Joined: 15 Jun 08
Posts: 347
Credit: 3,397,624
RAC: 3,695
Message 32279 - Posted: 5 Sep 2017, 17:19:30 UTC - in response to Message 32278.

OK. Thank you.
I understand that there's nothing to worry about.
Otherwise you are now aware of it.

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 358
Credit: 78,217,893
RAC: 112,259
Message 32399 - Posted: 12 Sep 2017, 18:54:59 UTC

I got a few errors, as before only on Linux, they are all:

AMPLITUDES EXCEED THE MAXIMUM VALUES IN UMLAUF

*** ERROR ***,PROBLEMS WRITING TO FILE 10 FROM ABEND
ERROR CODE : 5001

process exited with code 101 (0x65, -155)

They are all supershort run times, like before.

I think with the exact same hardware there wasn't any errors, I'm running some more for the next 24hr on Linux then will switch back to windows.

Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 29 Feb 16
Posts: 32
Credit: 404,031
RAC: 1,200
Message 32432 - Posted: 15 Sep 2017, 7:57:04 UTC - in response to Message 32399.

Hello Toby,
Thanks for pointing this out. I suspect something weird at the level of the input, not at the level of the exe. Could you point me to some of these results?
Thanks a lot in advance,

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 358
Credit: 78,217,893
RAC: 112,259
Message 32435 - Posted: 15 Sep 2017, 17:48:11 UTC - in response to Message 32432.

These are the only 2 left

https://lhcathome.cern.ch/lhcathome/result.php?resultid=155240749

https://lhcathome.cern.ch/lhcathome/result.php?resultid=155210817

Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 29 Feb 16
Posts: 32
Credit: 404,031
RAC: 1,200
Message 32437 - Posted: 15 Sep 2017, 22:36:31 UTC - in response to Message 32435.

I am really puzzled. A valid result on the same WU has been produced by the same executable on another host:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=74466109

Toby Broom
Volunteer moderator
Send message
Joined: 27 Sep 08
Posts: 358
Credit: 78,217,893
RAC: 112,259
Message 32441 - Posted: 16 Sep 2017, 8:42:38 UTC

In the past we thought it could be the hyperthreadding bug, but this uses AVX, plus I have tested this computer and it never had the bug with the proposed testing method. Plus some of the top error rates came from CPU's that were not effected e.g. i5/Xeon

This would leave Linux Kernel 4.8 as a possiable candidate. The failure on is the same these very short runtimes and alway on Linux.

I can run on another Linux kernel to see if it pops up again.

Message boards : Sixtrack Application : Very short Runtimes