Message boards :
Number crunching :
Abnormally short WU times
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Dec 08 Posts: 3 Credit: 3,948,842 RAC: 0 ![]() ![]() |
I just had several work units finish in under 10 min one finishing under 3 minutes, granted this is a new machine but i doubt that its that much quicker than my previous builds I feel that these 7 WUs terminated abnormally but supposedly finished. new rig is running a i7 5960x, 16gb of ddr4 ram running its normal XMP profile, other WUs progressing at normal pace, just want to make sure i'm not throwing in a bunch of invalid results, will suspend project if so. quick update current WUs finished in under 50 min like expected, unsure why previous WUs completed so quickly. |
![]() ![]() Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,866,264 RAC: 0 ![]() ![]() |
Hi Enightmare, Copy of my reply to similar question a year ago. There's nothing wrong at your end. It's pure chance whether you get a long run or a short run, although they probably appear in batches. The estimate is based on your previous work but will not necessarily be accurate for an individual work unit. Each wu has its own simulation of the beam and if that configuration results in the beam hitting the wall early on then the wu finishes early. Your results are validating against wingmen with similar run times. The short runs are just as useful as the long ones as they quickly show a beam setup which is wrong and therefore one to be avoided. So all looks good. Looks like all of your current batch are the longer ones at 12 hours or more (or less). Randomly selected from your running wus: sd_HL_7.5_440_1.4_6D_cc_err__3__s__62.31_60.32__4_6__6__85_1_sixvf_boinc255_1 The 6 indicates 10^6 turns so should be longer but even these can finish early if the beam parameters turn out to be unstable. |
Send message Joined: 29 Dec 08 Posts: 3 Credit: 3,948,842 RAC: 0 ![]() ![]() |
Ok, I just wanted to make sure the new setup wasn't causing a bunch of invalid results, holding up everyone else and not being useful. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Thanks Ray; and of course we have to allow for stability and estimate the maximum CPU time required. Hope to implement outliers soon to avoid some problems with these (very) short cases. Eric. |
![]() Send message Joined: 17 Jul 05 Posts: 102 Credit: 542,016 RAC: 0 |
There is one weird thing about those very short workunits : they run with only a small percentage (~7%) of CPU usage most of the time, that means, a result that reports 1.5 minutes CPU time to the core client actually ran up to 10 minutes. Longer LHC CPUs do that too, but only in the startup phase, so the inefficient part has less effect on the total runtime. Unlike some other projects, they do not increase the system time, which would still cause the CPU to be fully loaded (without beeing counted into the WU's CPU time), the CPU cores are just nearly idle in the first few minutes of LHC results, just as if the WUs would just take a nap. |
![]() ![]() Send message Joined: 29 Sep 04 Posts: 281 Credit: 11,866,264 RAC: 0 ![]() ![]() |
Generally, as a wu starts up, I see progress increasing for a few mins at low CPU usage up to around 6-8% progress then dropping back to zero and increasing normally at full CPU usage thereafter. In the short-running tasks, perhaps there is something going wrong during this initial setup stage. |
Send message Joined: 12 May 13 Posts: 8 Credit: 1,001,060 RAC: 0 ![]() ![]() |
I am noticing similar activity with my WUs as Ray Murray. Although some of mine went as high as 25% progress before returning to 0%. One additional thing I notice is that this drop in progress coincides with the first checkpoint. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
Well, as I have said before, SixTrack does pre-processing of up to 20 seconds typically, tracking for up to many hours, then a few seconds post-processing and closes all files. The pre-processing treats all the input and logs to the output file, the tracking writes binary data (a few bytes) and writes two checkpoint files typically every few minutes but depending on your time to checkpoint settings. I did a lot of work to keep the checkpoint files short about 10KB each, but they are opened and closed each time they are written. We have no graphics. While the pre-processing might have a lot of I/O when I run on my (admittedly) idle Linux box I see basically a 100% CPU usage ALL the time. I don't know how to look on Windows, but I suspect the scheduler and I/O must be causing the effect you see. Another possibility is that some studies are using "thick lens" where the checkpoint files are much larger. I just have to put this down to Windows 7 ??? Eric. |
![]() ![]() Send message Joined: 1 Jan 09 Posts: 32 Credit: 1,106,567 RAC: 0 |
For what it is worth, I am running both a windows box (win7) and a Linux box (Ubuntu). I see the same process described by Ray Murray on my windows machine, but I do not see it on the Linux. My Linux box starts at 0% and keep climbing until it's done. |
Send message Joined: 12 Jul 11 Posts: 857 Credit: 1,619,050 RAC: 0 |
OK; thanks. I'll put this down to Windows...... I remember many years ago a similar sort of issue with the early VAX VMS. My colleague Silvano de Gennaro and I tracked it down to poor scheduling so that the I/O bound work e.g.a file copy, took a long time. The Digital fix to the scheduler allowed the I/O bound job to get priority and run normally while the CPU bound processes lost very little time. Eric. |
Send message Joined: 9 Oct 10 Posts: 77 Credit: 3,671,357 RAC: 0 |
When the WUs are starting on Windows, sixtrack executable is not using a lot of CPU time, but one conhost.exe process and the csrss.exe process are using the CPU. After this phase which lasts about 2-3 minutes depending on the system which runs the project, only the sixtrack executable uses the CPU. I see a similar behaviour on Windows 7 and XP. |
©2025 CERN