Message boards : Number crunching : Can anyone stop this machine ?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,569,815
RAC: 13,099
Message 25719 - Posted: 30 Aug 2013, 21:16:43 UTC

I just checked my work and found some results, that where not yet validated.

When I followed the Tasks, I found a machine that is trashing round about 5.000 Tasks.

http://lhcathomeclassic.cern.ch/sixtrack/results.php?hostid=10298557

Please, can anyone stop this ?


Supporting BOINC, a great concept !
ID: 25719 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 807
Credit: 651,900,632
RAC: 291,407
Message 25720 - Posted: 31 Aug 2013, 0:38:27 UTC

You could try to PM the owner?
ID: 25720 · Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 9 Oct 10
Posts: 77
Credit: 3,671,357
RAC: 0
Message 25721 - Posted: 31 Aug 2013, 20:08:34 UTC

It's not the only one ... I've seen a few AMD CPUs as my wingmen that are trashing WUs too ... :(
ID: 25721 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 25724 - Posted: 1 Sep 2013, 5:56:57 UTC

Sorry about this; my fault as usual.
We are trying to run boinc tests for the new sixtrack.
I think they are all failing. We have to validate
"manually". I have the fix and we shall try again.
They will be very short so as not to waste CPU tiime.
More real work coming soon. Eric.
ID: 25724 · Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 9 Oct 10
Posts: 77
Credit: 3,671,357
RAC: 0
Message 25727 - Posted: 1 Sep 2013, 9:19:55 UTC

Hi Eric,

In this case we're not talking about SixTrackTest but Sixtrack production WUs ...

And I double checked, it's also Skyhawk's machine linked by Yeti which is trashing work :(
ID: 25727 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 25728 - Posted: 1 Sep 2013, 11:37:46 UTC - in response to Message 25727.  

Right; I checked again and indeed it looks like
production.........the plot thickens. I will
investigate more closely tomorrow when Danilo
(the user) is back from vacation. Very strange.
(In fact the "test" short jobs are returning results.)
I need to have a look at "computation error" I think
srderr just says too many exits.
(Maybe you could see something in fort.6 which I don't
get back.)
Anyway thanks for your help and messages. Eric.
ID: 25728 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 25732 - Posted: 2 Sep 2013, 6:27:08 UTC

I already had a look at a failure; I see:

Stderr output

7.0.64

The segment is already discarded and cannot be locked.
(0x9d) - exit code 157 (0x9d)ues have been in the results directory......




forrtl: severe (157): Program Exception - access violation

Image PC Routine Line Source
sixtrack_windows_ 005A58E9 Unknown Unknown Unknown
sixtrack_windows_ 005A3F83 Unknown Unknown Unknown
sixtrack_windows_ 0059BFDB Unknown Unknown Unknown
sixtrack_windows_ 0057BE85 Unknown Unknown Unknown
sixtrack_windows_ 0057BA1B Unknown Unknown Unknown
sixtrack_windows_ 00599D50 Unknown Unknown Unknown


]]>

Never seen this before! Perhaps a bug in SixTrack??? I have an open mind.
When I get to work I'll look at his case with Danilo.
However I also see my colleagues have been active in the results directory.
There are 3344 results (look OK) to be downloaded .
Have you updated client recently? More news soonest.

Thanks. Eric.
ID: 25732 · Report as offensive     Reply Quote
alvin
Avatar

Send message
Joined: 12 Mar 12
Posts: 128
Credit: 20,013,377
RAC: 0
Message 25733 - Posted: 2 Sep 2013, 7:02:42 UTC - in response to Message 25732.  

Eric
Most of my processing computers have no active tasks or just finalising 1-2 WU received 31 Aug or 1 Sep.
What do we expect now?
ID: 25733 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 25734 - Posted: 2 Sep 2013, 8:36:53 UTC - in response to Message 25733.  

Just finishing the installation of a new SixTrack.
More work coming. Just want to try and understand
these failures as well. Eric.
ID: 25734 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 25737 - Posted: 2 Sep 2013, 19:05:03 UTC

I tested a "failing" WU at CERN Linux and Windows.
Running OK. We shall see. Eric.
ID: 25737 · Report as offensive     Reply Quote

Message boards : Number crunching : Can anyone stop this machine ?


©2024 CERN