Message boards : Number crunching : Abnormally short WU times
Message board moderation

To post messages, you must log in.

AuthorMessage
enightmare

Send message
Joined: 29 Dec 08
Posts: 3
Credit: 1,851,933
RAC: 2
Message 26749 - Posted: 16 Sep 2014, 4:52:35 UTC
Last modified: 16 Sep 2014, 5:32:44 UTC

I just had several work units finish in under 10 min one finishing under 3 minutes, granted this is a new machine but i doubt that its that much quicker than my previous builds I feel that these 7 WUs terminated abnormally but supposedly finished.

new rig is running a i7 5960x, 16gb of ddr4 ram running its normal XMP profile, other WUs progressing at normal pace, just want to make sure i'm not throwing in a bunch of invalid results, will suspend project if so.

quick update current WUs finished in under 50 min like expected, unsure why previous WUs completed so quickly.
ID: 26749 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 233
Credit: 10,718,581
RAC: 4,042
Message 26750 - Posted: 16 Sep 2014, 7:42:50 UTC

Hi Enightmare,
Copy of my reply to similar question a year ago.

There's nothing wrong at your end. It's pure chance whether you get a long run or a short run, although they probably appear in batches.
The estimate is based on your previous work but will not necessarily be accurate for an individual work unit.
Each wu has its own simulation of the beam and if that configuration results in the beam hitting the wall early on then the wu finishes early. Your results are validating against wingmen with similar run times. The short runs are just as useful as the long ones as they quickly show a beam setup which is wrong and therefore one to be avoided.

So all looks good.

Looks like all of your current batch are the longer ones at 12 hours or more (or less).
Randomly selected from your running wus:
sd_HL_7.5_440_1.4_6D_cc_err__3__s__62.31_60.32__4_6__6__85_1_sixvf_boinc255_1 The 6 indicates 10^6 turns so should be longer but even these can finish early if the beam parameters turn out to be unstable.
ID: 26750 · Report as offensive     Reply Quote
enightmare

Send message
Joined: 29 Dec 08
Posts: 3
Credit: 1,851,933
RAC: 2
Message 26751 - Posted: 17 Sep 2014, 4:39:04 UTC - in response to Message 26750.  

Ok, I just wanted to make sure the new setup wasn't causing a bunch of invalid results, holding up everyone else and not being useful.
ID: 26751 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 851
Credit: 1,616,232
RAC: 132
Message 26752 - Posted: 18 Sep 2014, 9:38:12 UTC

Thanks Ray; and of course we have to allow for stability and estimate the
maximum CPU time required. Hope to implement outliers soon to avoid
some problems with these (very) short cases. Eric.
ID: 26752 · Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 17 Jul 05
Posts: 102
Credit: 542,016
RAC: 0
Message 26753 - Posted: 29 Sep 2014, 5:36:32 UTC

There is one weird thing about those very short workunits : they run with only a small percentage (~7%) of CPU usage most of the time, that means, a result that reports 1.5 minutes CPU time to the core client actually ran up to 10 minutes.

Longer LHC CPUs do that too, but only in the startup phase, so the inefficient part has less effect on the total runtime.

Unlike some other projects, they do not increase the system time, which would still cause the CPU to be fully loaded (without beeing counted into the WU's CPU time), the CPU cores are just nearly idle in the first few minutes of LHC results, just as if the WUs would just take a nap.
ID: 26753 · Report as offensive     Reply Quote
Profile Ray Murray
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 233
Credit: 10,718,581
RAC: 4,042
Message 26754 - Posted: 29 Sep 2014, 7:37:39 UTC
Last modified: 29 Sep 2014, 8:13:04 UTC

Generally, as a wu starts up, I see progress increasing for a few mins at low CPU usage up to around 6-8% progress then dropping back to zero and increasing normally at full CPU usage thereafter. In the short-running tasks, perhaps there is something going wrong during this initial setup stage.
ID: 26754 · Report as offensive     Reply Quote
Werinbert

Send message
Joined: 12 May 13
Posts: 5
Credit: 884,052
RAC: 252
Message 26755 - Posted: 29 Sep 2014, 22:39:04 UTC

I am noticing similar activity with my WUs as Ray Murray. Although some of mine went as high as 25% progress before returning to 0%. One additional thing I notice is that this drop in progress coincides with the first checkpoint.
ID: 26755 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 851
Credit: 1,616,232
RAC: 132
Message 26756 - Posted: 30 Sep 2014, 12:25:01 UTC

Well, as I have said before, SixTrack does pre-processing of up to 20 seconds
typically, tracking for up to many hours, then a few seconds post-processing
and closes all files.

The pre-processing treats all the input and logs to the output file, the tracking
writes binary data (a few bytes) and writes two checkpoint files typically every few
minutes but depending on your time to checkpoint settings.
I did a lot of work to keep the checkpoint files short about 10KB each, but they are
opened and closed each time they are written. We have no graphics.

While the pre-processing might have a lot of I/O when I run on my (admittedly) idle
Linux box I see basically a 100% CPU usage ALL the time. I don't know how to look
on Windows, but I suspect the scheduler and I/O must be causing the effect you see.

Another possibility is that some studies are using "thick lens" where the checkpoint files
are much larger.

I just have to put this down to Windows 7 ??? Eric.
ID: 26756 · Report as offensive     Reply Quote
Profile White Mountain Wes
Avatar

Send message
Joined: 1 Jan 09
Posts: 32
Credit: 891,226
RAC: 463
Message 26757 - Posted: 30 Sep 2014, 19:12:33 UTC

For what it is worth, I am running both a windows box (win7) and a Linux box (Ubuntu). I see the same process described by Ray Murray on my windows machine, but I do not see it on the Linux. My Linux box starts at 0% and keep climbing until it's done.
ID: 26757 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 851
Credit: 1,616,232
RAC: 132
Message 26758 - Posted: 1 Oct 2014, 9:04:39 UTC - in response to Message 26757.  

OK; thanks. I'll put this down to Windows......
I remember many years ago a similar sort of issue with the
early VAX VMS. My colleague Silvano de Gennaro and I tracked it
down to poor scheduling so that the I/O bound work e.g.a file
copy, took a long time. The Digital fix to the scheduler allowed the
I/O bound job to get priority and run normally while the CPU bound
processes lost very little time. Eric.
ID: 26758 · Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 9 Oct 10
Posts: 77
Credit: 3,534,314
RAC: 28
Message 26759 - Posted: 1 Oct 2014, 9:13:42 UTC
Last modified: 1 Oct 2014, 9:16:42 UTC

When the WUs are starting on Windows, sixtrack executable is not using a lot of CPU time, but one conhost.exe process and the csrss.exe process are using the CPU. After this phase which lasts about 2-3 minutes depending on the system which runs the project, only the sixtrack executable uses the CPU.

I see a similar behaviour on Windows 7 and XP.
ID: 26759 · Report as offensive     Reply Quote

Message boards : Number crunching : Abnormally short WU times


©2019 CERN