Message boards :
Number crunching :
Big problem: work units running with negative time.
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Thanks for all the input (and clearly many have a good understanding of the difficulties). I feel SixTrack is OK but we have a bit of an unusual workload. I reckon we made a "bad" Linux build with a "dud" version of the boinc_api or boinc lib. As you know tests on CERN Linux are OK. I am going to rebuild (with help) and after verifying CERN Linux will try and find a Ubuntu. Thinking about withdrawing Linux executable temporarily.....More news soonest. (Also ran out of CERN disk space (again!)) Eric. |
Send message Joined: 2 Oct 12 Posts: 8 Credit: 1,760,682 RAC: 0 |
Same problem with my linux host here http://lhcathomeclassic.cern.ch/sixtrack/show_host_detail.php?hostid=10283293 Nearly all units failed with too many exits, no heartbeat error. A couple appear to have validated with 0.21 credits for 6 sec run time which I would guess is wrong. |
Send message Joined: 30 Dec 12 Posts: 1 Credit: 11,000 RAC: 0 |
(We do not yet have a MAC executable.) There seems to be an incompatibility between the current mac version 444.02 and the new linux/windows versions 446.03. All my recent results were declared invalid. http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=8974642 http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9054637 http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=9088606 I never had problems before the new versions came out. What is the current situation on this issue? |
Send message Joined: 16 May 11 Posts: 79 Credit: 111,419 RAC: 0 |
Friends, we may have a remedy for the failed Linux execs. This is the version 4465. It is installed only for the Linux machines, it should work now but we need testing. It is in the sixtracktest (you may need to allow for test work to flow). Can experts with the Linux machines allow the 4465 to run (higher version should be taken automatically) and report back if the cure worked? Thank you, Igor. skype id: igor-zacharov |
Send message Joined: 16 May 11 Posts: 79 Credit: 111,419 RAC: 0 |
it occurred to me that it may be difficult to select only the test flow as opposed to the general work. Therefore, the 4465 execs for Linux are also in the production directory. Please, give it a try. Thanks, Igor. skype id: igor-zacharov |
Send message Joined: 27 Sep 04 Posts: 20 Credit: 23,880 RAC: 0 |
I got 8 very quick WUs (less than 30 seconds) and they all completed without errors, using 4465 pni. (Fedora 19 x86_64 on AMD cpu) |
Send message Joined: 26 Sep 11 Posts: 37 Credit: 7,807,848 RAC: 10 |
Sample size of 1 so far. Task 20475555 just completed on my work machine, running Xubuntu 12.04 64 bit. Took about 44 minutes. Is now waiting for validation. Second work unit has now started, and also seems to be running without trouble. Assuming it will stay that way, I am very grateful for the fix. |
Send message Joined: 16 May 11 Posts: 79 Credit: 111,419 RAC: 0 |
I'm happy it works so far. Full credit for the solution should go to Pauli Nieminen. Pauli - thank you much for finding the problem! Your debugging skills played pivotal role when homing on the culprit. Igor. skype id: igor-zacharov |
Send message Joined: 26 Sep 11 Posts: 37 Credit: 7,807,848 RAC: 10 |
Sample size of 3 now. 3 completed of which 2 have already been validated. One on work PC with Xubuntu 12.04 and other on home PC with Xubuntu 12.10. Looks to me like the problem really has been solved. Congratulations and thanks. |
Send message Joined: 4 Jan 07 Posts: 3 Credit: 2,197,570 RAC: 0 |
One task completed and validated (sixtrack_lin32_4465_pni.linux, boinc 7.2.0 (X64)). Looks OK. |
©2024 CERN