Questions and Answers : Unix/Linux : no progress indication
Message board moderation

To post messages, you must log in.

AuthorMessage
Michael Karlinsky
Avatar

Send message
Joined: 18 Sep 04
Posts: 163
Credit: 1,682,370
RAC: 0
Message 5716 - Posted: 21 Feb 2005, 19:31:23 UTC
Last modified: 21 Feb 2005, 19:34:47 UTC

Hi, I know this is a minor issue at the moment,
but there is no progress indication, see excerpt
from client_state

(result)
(name)v64boince6ib1-40s10_12630_1_sixvf_1697_8(/name)
(final_cpu_time)0.000000(/final_cpu_time)
(exit_status)0(/exit_status)
(state)2(/state)
(wu_name)v64boince6ib1-40s10_12630_1_sixvf_1697(/wu_name)
(report_deadline)1110213396(/report_deadline)
(active_task)
(project_master_url)http://lhcathome.cern.ch/(/project_master_url)
(result_name)v64boince6ib1-40s10_12630_1_sixvf_1697_8(/result_name)
(app_version_num)463(/app_version_num)
(slot)1(/slot)
(scheduler_state)2(/scheduler_state)
(checkpoint_cpu_time)0.000000(/checkpoint_cpu_time)
(fraction_done)0.000000(/fraction_done)
(current_cpu_time)594.550000(/current_cpu_time)
(/active_task)
(/result)

This is Boinc 4.19 on Linux 2.4.19.

[Edit: substituted tags]
ID: 5716 · Report as offensive     Reply Quote
BeBiMaGe

Send message
Joined: 2 Sep 04
Posts: 4
Credit: 22,828
RAC: 0
Message 5737 - Posted: 21 Feb 2005, 20:58:14 UTC

My WU just restarted and it seems to start at the beginning!
ID: 5737 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 18 Sep 04
Posts: 47
Credit: 1,886,234
RAC: 0
Message 5945 - Posted: 23 Feb 2005, 2:05:11 UTC

Same problem with the PP Linux cruncher.. (not sure about einstien, but CP seems OK...)

The Seti Linux Cruncher hasn't been updated in quite some time, so I wonder if a common header file or source file for crunchers doesn't do the right thing for linux....
ID: 5945 · Report as offensive     Reply Quote
Michael Karlinsky
Avatar

Send message
Joined: 18 Sep 04
Posts: 163
Credit: 1,682,370
RAC: 0
Message 5998 - Posted: 23 Feb 2005, 18:57:20 UTC - in response to Message 5737.  

> My WU just restarted and it seems to start at the beginning!
>

This only seems to be the case. After each restart computation
begins at the point where it finished last time. You can monitor
this by watching the files fort.91 and fort.93 in the slots directory.

But at least final_cpu_time is calcutated incorrectly, see
current_cpu_time and final_cpu_time for the above result:

(result)
(name)v64boince6ib1-53s4_6615_1_sixvf_2201_10(/name)
(final_cpu_time)0.000000(/final_cpu_time)
(exit_status)0(/exit_status)
(state)2(/state)
(wu_name)v64boince6ib1-53s4_6615_1_sixvf_2201(/wu_name)
(report_deadline)1110219990(/report_deadline)
(active_task)
(project_master_url)http://lhcathome.cern.ch/(/project_master_url)
(result_name)v64boince6ib1-53s4_6615_1_sixvf_2201_10(/result_name)
(app_version_num)463(/app_version_num)
(slot)1(/slot)
(scheduler_state)2(/scheduler_state)
(checkpoint_cpu_time)0.000000(/checkpoint_cpu_time)
(fraction_done)0.000000(/fraction_done)
(current_cpu_time)2346.400000(/current_cpu_time)
(/active_task)

(current_cpu_time)7036.600000(/current_cpu_time) ## next restart
(current_cpu_time)35315.050000(/current_cpu_time) ## next restart
(current_cpu_time)2772.470000(/current_cpu_time) ## next restart

(result)
(name)v64boince6ib1-53s4_6615_1_sixvf_2201_10(/name)
(final_cpu_time)7742.090000(/final_cpu_time)
(exit_status)0(/exit_status)
(state)4(/state)
(wu_name)v64boince6ib1-53s4_6615_1_sixvf_2201(/wu_name)
(report_deadline)1110219990(/report_deadline)
(/result)

(result)
(name)v64boince6ib1-53s4_6615_1_sixvf_2201_10(/name)
(final_cpu_time)7742.090000(/final_cpu_time)
(exit_status)0(/exit_status)
(state)5(/state)
(ready_to_report/)
(wu_name)v64boince6ib1-53s4_6615_1_sixvf_2201(/wu_name)
(report_deadline)1110219990(/report_deadline)
(/result)

Hopefully this does not affect claimed credit calculation. Now it would
be helpful to have a look at the results :)

... and there ist still the fraction_done issue.

Michael
ID: 5998 · Report as offensive     Reply Quote
Profile Trane Francks

Send message
Joined: 18 Sep 04
Posts: 71
Credit: 28,399
RAC: 0
Message 6011 - Posted: 24 Feb 2005, 0:20:28 UTC

What I see here is sixtrack version 4.63 only indicates progress in CPU time. It does not display progress in percent completed.

ID: 6011 · Report as offensive     Reply Quote
Michael Karlinsky
Avatar

Send message
Joined: 18 Sep 04
Posts: 163
Credit: 1,682,370
RAC: 0
Message 6028 - Posted: 24 Feb 2005, 10:16:50 UTC - in response to Message 6011.  

> What I see here is sixtrack version 4.63 only indicates progress in CPU time.
> It does not display progress in percent completed.
>
>

Not really. The mentioned WU was suspended 3 times with CPU times of
2346, 7036, 35315, 2772 and 7742 seconds, so total CPU time would be
55211 seconds. But only 7742 are reported (the CPU time from the last
interval).

But unfortunatly I got a WU which indeed restarts from the beginning. See
fort93:

SIXTRACR MAINCR SIXTRACR starts on: 23rd of February 2005, 44 minutes after 20.
SIXTRACR CRCHECK CALLED lout= 92 restart F rerun F checkp F
SIXTRACR CRCHECK no restart possible checkp= F
SIXTRACR CRCHECK giving up on LOUT

SIXTRACR MAINCR SIXTRACR reruns on: 23rd of February 2005, 02 minutes after 21.
SIXTRACR CRCHECK CALLED lout= 92 restart F rerun T checkp F
SIXTRACR CRCHECK no restart possible checkp= F
SIXTRACR CRCHECK overwriting fort.6
SIXTRACR CRCHECK giving up on LOUT

SIXTRACR MAINCR SIXTRACR reruns on: 24th of February 2005, 38 minutes after 07.
SIXTRACR CRCHECK CALLED lout= 92 restart F rerun T checkp F
SIXTRACR CRCHECK no restart possible checkp= F
SIXTRACR CRCHECK overwriting fort.6
SIXTRACR CRCHECK giving up on LOUT

SIXTRACR MAINCR SIXTRACR reruns on: 24th of February 2005, 02 minutes after 11.
SIXTRACR CRCHECK CALLED lout= 92 restart F rerun T checkp F
SIXTRACR CRCHECK no restart possible checkp= F
SIXTRACR CRCHECK overwriting fort.6
SIXTRACR CRCHECK giving up on LOUT

This is WU: v64lhc87-34s10_12515_1_sixvf_996_1
Hope this helps to track down the problem.

Michael
ID: 6028 · Report as offensive     Reply Quote
Profile Trane Francks

Send message
Joined: 18 Sep 04
Posts: 71
Credit: 28,399
RAC: 0
Message 6034 - Posted: 24 Feb 2005, 11:58:37 UTC - in response to Message 6028.  

> > What I see here is sixtrack version 4.63 only indicates progress in CPU
> > time. It does not display progress in percent completed.
>
> Not really.

Well, yes and no. There's no doubt that different systems can (and often do) see different behaviour. The latest issue under the no-progress banner is that it appeared that a WU had "stalled" -- 1.0 load but no progress -- so I killed it. The INSTANT I sent it the kill signal, the progress updated from 0.000 to 0.57-something hours.

Something is definitely unwell and it seems that there are several different-yet-related issues.

ID: 6034 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : no progress indication


©2024 CERN