Message boards : Number crunching : Big problem: work units running with negative time.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 25814 - Posted: 9 Sep 2013, 8:52:06 UTC

Thanks for all the input (and clearly many have a good understanding
of the difficulties). I feel SixTrack is OK but we have a bit of an unusual
workload. I reckon we made a "bad" Linux build with a "dud" version
of the boinc_api or boinc lib. As you know tests on CERN Linux are OK.
I am going to rebuild (with help) and after verifying CERN Linux will try
and find a Ubuntu. Thinking about withdrawing Linux executable
temporarily.....More news soonest.
(Also ran out of CERN disk space (again!)) Eric.
ID: 25814 · Report as offensive     Reply Quote
Alez

Send message
Joined: 2 Oct 12
Posts: 8
Credit: 1,760,682
RAC: 0
Message 25820 - Posted: 10 Sep 2013, 20:56:46 UTC

Same problem with my linux host here http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=10283293
Nearly all units failed with too many exits, no heartbeat error. A couple appear to have validated with 0.21 credits for 6 sec run time which I would guess is wrong.
ID: 25820 · Report as offensive     Reply Quote
CGR

Send message
Joined: 30 Dec 12
Posts: 1
Credit: 11,000
RAC: 0
Message 25836 - Posted: 16 Sep 2013, 21:56:36 UTC - in response to Message 25771.  

(We do not yet have a MAC executable.)

There seems to be an incompatibility between the current mac version 444.02 and the new linux/windows versions 446.03. All my recent results were declared invalid.
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=8974642
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9054637
http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9088606
I never had problems before the new versions came out. What is the current situation on this issue?
ID: 25836 · Report as offensive     Reply Quote
Profile Igor Zacharov
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 16 May 11
Posts: 79
Credit: 111,419
RAC: 0
Message 25842 - Posted: 18 Sep 2013, 15:47:44 UTC

Friends,

we may have a remedy for the failed Linux execs. This is the version 4465.
It is installed only for the Linux machines, it should work now but we need testing.

It is in the sixtracktest (you may need to allow for test work to flow).

Can experts with the Linux machines allow the 4465 to run (higher version should be taken automatically) and report back if the cure worked?

Thank you, Igor.
skype id: igor-zacharov
ID: 25842 · Report as offensive     Reply Quote
Profile Igor Zacharov
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 16 May 11
Posts: 79
Credit: 111,419
RAC: 0
Message 25843 - Posted: 18 Sep 2013, 15:56:10 UTC - in response to Message 25842.  

it occurred to me that it may be difficult to select only the test flow as opposed to the general work.

Therefore, the 4465 execs for Linux are also in the production directory.
Please, give it a try.

Thanks, Igor.
skype id: igor-zacharov
ID: 25843 · Report as offensive     Reply Quote
Mattia Verga

Send message
Joined: 27 Sep 04
Posts: 20
Credit: 23,880
RAC: 0
Message 25844 - Posted: 18 Sep 2013, 16:05:30 UTC - in response to Message 25843.  

I got 8 very quick WUs (less than 30 seconds) and they all completed without errors, using 4465 pni. (Fedora 19 x86_64 on AMD cpu)
ID: 25844 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 7,807,848
RAC: 10
Message 25845 - Posted: 18 Sep 2013, 21:37:35 UTC - in response to Message 25844.  

Sample size of 1 so far. Task 20475555 just completed on my work machine, running Xubuntu 12.04 64 bit. Took about 44 minutes. Is now waiting for validation.

Second work unit has now started, and also seems to be running without trouble. Assuming it will stay that way, I am very grateful for the fix.
ID: 25845 · Report as offensive     Reply Quote
Profile Igor Zacharov
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 16 May 11
Posts: 79
Credit: 111,419
RAC: 0
Message 25846 - Posted: 19 Sep 2013, 0:33:20 UTC

I'm happy it works so far.

Full credit for the solution should go to Pauli Nieminen.

Pauli - thank you much for finding the problem!
Your debugging skills played pivotal role when homing on the culprit.

Igor.
skype id: igor-zacharov
ID: 25846 · Report as offensive     Reply Quote
jelle

Send message
Joined: 26 Sep 11
Posts: 37
Credit: 7,807,848
RAC: 10
Message 25847 - Posted: 19 Sep 2013, 3:11:15 UTC - in response to Message 25846.  

Sample size of 3 now. 3 completed of which 2 have already been validated. One on work PC with Xubuntu 12.04 and other on home PC with Xubuntu 12.10.
Looks to me like the problem really has been solved.
Congratulations and thanks.
ID: 25847 · Report as offensive     Reply Quote
SeersantLoom

Send message
Joined: 4 Jan 07
Posts: 3
Credit: 2,197,570
RAC: 0
Message 25849 - Posted: 20 Sep 2013, 12:22:21 UTC

One task completed and validated (sixtrack_lin32_4465_pni.linux, boinc 7.2.0 (X64)). Looks OK.
ID: 25849 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : Big problem: work units running with negative time.


©2024 CERN