Message boards :
Number crunching :
cpu time ok... but zero credits granted
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Send message Joined: 28 Sep 04 Posts: 27 Credit: 17,091 RAC: 0 |
Man! A state-of-the-art app. that cant tell time. Wonder, who's the fool here. Will keep it running just for good laughs. ;]] |
Send message Joined: 2 Sep 04 Posts: 352 Credit: 1,393,150 RAC: 0 |
Yes, as long as my Daily Credits don't drop to far I'll hang with it for a while longer, but there comes a point I'll have to get in the Life Raft myself if things get to bad ... hehe |
Send message Joined: 28 Sep 04 Posts: 27 Credit: 17,091 RAC: 0 |
I've been investigating a bit further... And was unable to find to which file LHC stores progress info. Did find a .zip file that was updated at the time CC switched from LHC to another project, but didnt find anything resembling CPU time in any of zipped files. |
Send message Joined: 29 Sep 04 Posts: 187 Credit: 705,487 RAC: 0 |
I have to admit that I am probably going to drop the CPU percentage for the project. Almost all of my wu's are going back and earning zero. I have it currently set for 25% of my CPU time, thats about 8 hours per day. Those 8 hours could be getting credits for my team at one of the other projects. It seems to have got much worse recently. Example. Look at the results before 1st March and then after. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 2 Sep 04 Posts: 352 Credit: 1,393,150 RAC: 0 |
It seems to have got much worse recently ========= I agree, I haven't actually seen 1 lhc type WU turned in by me today with any time reported or at the most just a few seconds ... I'm trying to get to the ones I downloaded yesterday to see if their any better, but I have a few hours to go yet before I get into them ... |
Send message Joined: 28 Sep 04 Posts: 27 Credit: 17,091 RAC: 0 |
If nothing changes for the better in a day or so, I am suspending this project until this is fixed. |
Send message Joined: 27 Sep 04 Posts: 282 Credit: 1,415,417 RAC: 0 |
> If nothing changes for the better in a day or so, I am suspending this project > until this is fixed. > > good point. an update from LHC admins would be very appreciated.... if they don't know the reason, they could tell us to suspend until that is fixed.... |
Send message Joined: 2 Sep 04 Posts: 545 Credit: 148,912 RAC: 0 |
Just some news ... Last week someone asked me a question about processing times and to give them facts instead of rumors I took the log files that BOINC View writes and put the data into a really ugly table (from a relational database perspective). Anyway, I am too messed up to day to do anything hard (for me relational databases are way too easy stuff) and I decided to look to see about the 0 second problem. What I learned: 1) I have 396 results with 0 length run times. Of these, I know some of the recent ones have had the 10 hour runtimes, because I watched them running and they were well into a high run time. 2) in the same time frame I have had 3141 returns with non-zero final times. Failures are therefore about a 12% rate. 3) There is no way for me to tell based on the numbers in the logs as to what the "size" of the model was, since we have a spread of 10,000; 100,000; and 1,000,000 turn work units theory says that the distribution would be over the three types. My informal observation says that the failures occur with the larger run time models rather than the shorter ones. 4) Application versions include 4.45; 4.46, 4.47, 4.03, and 4.64 5) The errors seem to cover the entier period for which I have data starting roughly September of last year. This seems to be an indication that this is a long standing problem. Questions: 1) Is there a way to distinguish the programmed run length from the Work Unit or Result Name? 2) Does anyone else have these log files available? |
Send message Joined: 27 Sep 04 Posts: 282 Credit: 1,415,417 RAC: 0 |
Nice access to the problem.... I don't have boincview, but I think I have something that helps you..... http://lhcathome.cern.ch/result.php?resultid=186717 Name v64lhc88-43s10_12545_1_sixvf..... (100.000 Turns) http://lhcathome.cern.ch/result.php?resultid=186408 Name v64boince6ib1-13s4_6630_1_sixvf..... (1.000.000 Turns) > > Questions: > 1) Is there a way to distinguish the programmed run length from the Work Unit > or Result Name? > > 2) Does anyone else have these log files available? > > |
Send message Joined: 2 Sep 04 Posts: 352 Credit: 1,393,150 RAC: 0 |
2) in the same time frame I have had 3141 returns with non-zero final times. Failures are therefore about a 12% rate. ========== @Paul No offense Paul but I think your way off base on that 12% failure figure, especially since LHC has come back online the last few weeks. I just turned in 23 v64lhc type WU's off 1 of my PC's that were run over a 17 hour period and ... 16 showed no Time at all ... 3 showed 10 seconds or less... 3 showed the correct running time... I've been seeing this sort of results for 2 weeks now on all my PC's, so if you figure the current failure rate to turn in the correct Time Result it's more like 85%-90% ... Actually I have better results with the v64boince type WU's, now the failure rate for correct time on them is probably around 12% ... :) |
Send message Joined: 2 Sep 04 Posts: 309 Credit: 715,258 RAC: 0 |
A general question. Is this problem occuring to anyone who is running LHC 100% and is not stopping and restarting BOINC? I know I am resource sharing since LHC restarted (even during the alpha test) and during the alpha test I did not get the problem but since then I have reported 0 cpu time at least 75 to 80% of the time (Intel 3.2GHz, HT on, XP, BOINC Manager V4.24 other project Einstein, Pred and Seti). Shame the science is good, we might get somewhere on the credit issue then. Starting to think about reducing resource share. Live long and crunch. Paul (S@H1 8888) BOINC/SAH BETA |
Send message Joined: 2 Sep 04 Posts: 352 Credit: 1,393,150 RAC: 0 |
A general question. Is this problem occuring to anyone who is running LHC 100% and is not stopping and restarting BOINC? ========== I'm running 7 PC's 24/7 exclusively here at the LHC Site Giant, all P4 HT CPU's in the 3.06 to 3.4 range. The problem occurs on all of them ... The v64boince type WU's seem to return the Time most of the time but I have sat here and watched the time drop from 10 hours to like 23 minutes when the WU was finished. Now the v64lhc type WU's are totally borked when it comes to turning in the correct time, it's time to pop open the champagne bottle when 1 does actually turn in the correct time ... hehe |
Send message Joined: 28 Sep 04 Posts: 722 Credit: 48,339,087 RAC: 29,661 |
> A general question. Is this problem occuring to anyone who is running LHC > 100% and is not stopping and restarting BOINC? > > I know I am resource sharing since LHC restarted (even during the alpha test) > and during the alpha test I did not get the problem but since then I have > reported 0 cpu time at least 75 to 80% of the time (Intel 3.2GHz, HT on, XP, > BOINC Manager V4.24 other project Einstein, Pred and Seti). > > Shame the science is good, we might get somewhere on the credit issue then. > > Starting to think about reducing resource share. > > Live long and crunch. > > > I have now a situation where LHC is running 100% 24/7 on one of my hosts because it ran out Seti WU's yesterday morning. Since that it has finished 34 LHC WU's. 14 pcs 0 sec 5 pcs 1...25 sec 3 pcs 2...30 min 10 pcs 30...60 min 2 pcs 7 h 21 min It's a 3.06 GHz Xeon, Win2000, CC 4.19, sixtrack 4.64 |
Send message Joined: 28 Sep 04 Posts: 27 Credit: 17,091 RAC: 0 |
That's it... Have a look: http://lhcathome.cern.ch/results.php?userid=3120 I'm suspending this project until this zero time error is fixed. What I cant understand is, why didn't project managers set the validator to award max. credit to 0 time results if there is a >0 time result received for the same WU. |
Send message Joined: 2 Sep 04 Posts: 545 Credit: 148,912 RAC: 0 |
Poorboy, > No offense Paul but I think your way off base on that 12% failure figure, I'm autistic, remember? I don't even notice real insults. :) > especially since LHC has come back online the last few weeks. I just turned in > 23 v64lhc type WU's off 1 of my PC's that were run over a 17 hour period and > ... > > 16 showed no Time at all ... > 3 showed 10 seconds or less... > 3 showed the correct running time... Well, all I can say is that my data is for a total of a lot more than that since September of last year. The discriminator I used was Greater than 1 or less than one, So, basically I tested for 0 second results. But to be fully consistent I will use equal 0 and not equal zero for testing ... and since some one has indicated that there is a name discriminator, now I can use that to group the results also... > I've been seeing this sort of results for 2 weeks now on all my PC's, so if > you figure the current failure rate to turn in the correct Time Result it's > more like 85%-90% ... So, with a total of 3141 non-zero returns and 396 with 0, well, you are right, I did the 396 divided by 3141 when it should have been 396 ÷ 3537 = 11% ... > Actually I have better results with the v64boince type WU's, now the failure > rate for correct time on them is probably around 12% ... :) I did not try to look only at the latest period, especially when the data indicates that the problem seems to be older than *I* expected. Even the short run WU should have something that is a low, but not zero runtime. |
©2024 CERN