Message boards : Sixtrack Application : Very short Runtimes
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,922,687
RAC: 137,969
Message 32242 - Posted: 5 Sep 2017, 10:29:54 UTC

Today each of my hosts got a bunch of sixtrack tasks.
Most of them have runtimes of only a few seconds.
Faulty or not?

See:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10486310
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10486393
ID: 32242 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,659,975
RAC: 0
Message 32278 - Posted: 5 Sep 2017, 16:52:12 UTC - in response to Message 32242.  
Last modified: 5 Sep 2017, 16:52:30 UTC

these are tasks where a region of highly unstable phase space is scanned

You can guess on your own if the tasks is likely to be short or not from its name, e.g.: https://lhcathome.cern.ch/lhcathome/result.php?resultid=154383317 . The
__14_16__
means that a normalised amplitude between 14 and 16 sigma is scanned, which is very large wrt typical figures, where we expect some chaotic motion to arise at about 6sigma
ID: 32278 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,922,687
RAC: 137,969
Message 32279 - Posted: 5 Sep 2017, 17:19:30 UTC - in response to Message 32278.  

OK. Thank you.
I understand that there's nothing to worry about.
Otherwise you are now aware of it.
ID: 32279 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,723,571
RAC: 234,385
Message 32399 - Posted: 12 Sep 2017, 18:54:59 UTC

I got a few errors, as before only on Linux, they are all:

AMPLITUDES EXCEED THE MAXIMUM VALUES IN UMLAUF

*** ERROR ***,PROBLEMS WRITING TO FILE 10 FROM ABEND
ERROR CODE : 5001

process exited with code 101 (0x65, -155)

They are all supershort run times, like before.

I think with the exact same hardware there wasn't any errors, I'm running some more for the next 24hr on Linux then will switch back to windows.
ID: 32399 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,659,975
RAC: 0
Message 32432 - Posted: 15 Sep 2017, 7:57:04 UTC - in response to Message 32399.  

Hello Toby,
Thanks for pointing this out. I suspect something weird at the level of the input, not at the level of the exe. Could you point me to some of these results?
Thanks a lot in advance,
ID: 32432 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,723,571
RAC: 234,385
Message 32435 - Posted: 15 Sep 2017, 17:48:11 UTC - in response to Message 32432.  

These are the only 2 left

https://lhcathome.cern.ch/lhcathome/result.php?resultid=155240749

https://lhcathome.cern.ch/lhcathome/result.php?resultid=155210817
ID: 32435 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,659,975
RAC: 0
Message 32437 - Posted: 15 Sep 2017, 22:36:31 UTC - in response to Message 32435.  

I am really puzzled. A valid result on the same WU has been produced by the same executable on another host:
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=74466109
ID: 32437 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 798
Credit: 644,723,571
RAC: 234,385
Message 32441 - Posted: 16 Sep 2017, 8:42:38 UTC

In the past we thought it could be the hyperthreadding bug, but this uses AVX, plus I have tested this computer and it never had the bug with the proposed testing method. Plus some of the top error rates came from CPU's that were not effected e.g. i5/Xeon

This would leave Linux Kernel 4.8 as a possiable candidate. The failure on is the same these very short runtimes and alway on Linux.

I can run on another Linux kernel to see if it pops up again.
ID: 32441 · Report as offensive     Reply Quote

Message boards : Sixtrack Application : Very short Runtimes


©2024 CERN