21)
Message boards :
Number crunching :
Stil a pending credit
(Message 18621)
Posted 20 Nov 2007 by EclipseHA Post:
I understand that there are 2 different causes to the "zombie" WU's.. I've got about 30 of the 0.00xx pending, and maybe 20 of the "long lost granted" WU's (and two that errored out long ago...). In both cases, however, it does incdicate a cleanup is in order! |
22)
Message boards :
Number crunching :
Stil a pending credit
(Message 18610)
Posted 18 Nov 2007 by EclipseHA Post:
Here's one from 2005 that was granted credit: http://lhcathome.cern.ch/lhcathome/result.php?resultid=694330 If you try to look at the WU, you get an error that says "workunit not found" |
23)
Message boards :
Number crunching :
Stil a pending credit
(Message 18603)
Posted 17 Nov 2007 by EclipseHA Post:
Hey, I got some of these 0.00xxx results from April.. If they're not in the DB now, they never will be! In total, I got about 50 of these 0.00xx results, and some that were actually granted real credit back in 2005! Housekeeping time! |
24)
Message boards :
Number crunching :
Stil a pending credit
(Message 18546)
Posted 4 Nov 2007 by EclipseHA Post: Actually, I don't care if I ever see the "credit" from the (now about 30) 0.00xxxx pending WU's, but I'd really like to see them vanish from my pending credit list! Same with the WU's that were granted credit as far back as 2005! |
25)
Message boards :
Number crunching :
Stil a pending credit
(Message 18530)
Posted 3 Nov 2007 by EclipseHA Post: with the WU's available over the last couple weeks, I now have about 20 "pending", where the claimed is of the 0.00xxxx variety. |
26)
Message boards :
Number crunching :
Stil a pending credit
(Message 18314)
Posted 19 Oct 2007 by EclipseHA Post: While current (completed) results seem to being purged after a few days, good results that are months old are still hanging around. I have some from 2005. Also, pending from this last run, as well as April 2007, where there was a credit claim of 0. (marked as "pending"). |
27)
Message boards :
Number crunching :
Initial Replication
(Message 18225)
Posted 17 Oct 2007 by EclipseHA Post: Other than the fact that some WU's may get crunched 2 more times than needed (with credit granted), I'm not sure where this is causing harm. Sure you're using electricity, but it's up to the project. People have been complaining about "lack of work" here for years, and to cut IR from 5 to 3 means that there's 40% less work right off the bat. Right now, today, LHC, has taken some measures to keep work in the pipeline longer - the 2/day/cpu, the 1h delay, etc. with the press release and all. I think we should all just step back and be happy that there has been a flow of work (be it 2/day) for the longest time I've seen in years. If you don't like the way the project is being managed, speak with your feet and crunch for another project. |
28)
Message boards :
Number crunching :
Stil a pending credit
(Message 17582)
Posted 28 Jul 2007 by EclipseHA Post: WU's pending are not the only problem. The number of results and workunits awaiting deletion is enormous. This is preventing everyone from deleting hosts that are no longer active or will not merge. Good point about the hosts.. I don't have any extras right now, but I got a boatload of ghost WUs in my history! Time for housecleaning, I agree! |
29)
Message boards :
Number crunching :
Stil a pending credit
(Message 17555)
Posted 26 Jul 2007 by EclipseHA Post: I agree - time to clean up the database. I got WUs from April which will be pending forever. Once the test batch is over, it's time to tidy up! |
30)
Message boards :
Number crunching :
Can't Access Work Units
(Message 17506)
Posted 23 Jul 2007 by EclipseHA Post: OK just to add we have it sorted that we can get this fixed quicker if it happens again but it shouldn't as we've solved the problem. I think it's safe to say they "thought" they solved the problem! :) That's just an example why this test run could do some good! It is kind of interesting that ~10% of the work in the test batch is now queued for retransmission after only a couple days - seems like a high error rate to me! (another example why the test run could do some good!) |
31)
Message boards :
Number crunching :
Can't Access Work Units
(Message 17503)
Posted 23 Jul 2007 by EclipseHA Post: Seems the server shut itself down again... Right now, the stas on the main page show 1899 WUs available, but I get "now work available" when trying to get some.... |
32)
Message boards :
Number crunching :
Bad thread priority
(Message 17490)
Posted 22 Jul 2007 by EclipseHA Post: "Sluggish" could also be due to more than the thread priority - for example, page faults. What's your memory usage look like? Do you know the tool you're using is giving you the right information? (I REALLY doubt that any BOINC related thread is running at "realtime" under windows) Do you understand what changing the priority really means to your whole system? Why are you the only one having this problem with LHC and other projects? Sounds like there's something weird with your system, and you're just using a bandaid..... Just my opinion |
33)
Message boards :
Number crunching :
Bad thread priority
(Message 17474)
Posted 21 Jul 2007 by EclipseHA Post: "Sluggish" could also be due to more than the thread priority - for example, page faults. What's your memory usage look like? |
34)
Message boards :
Number crunching :
Even if there's no "real work", may I suggest a "real test"?
(Message 17385)
Posted 19 Jul 2007 by EclipseHA Post: Seems that "outstanding work" is still at ~35 days after 1000 test WU's were released. Could it be the "ghost" problem as the numbers haven't changed much today? I think doing a real load test would be a good thing for this project. It's no more "wasted cycles" than SETI which seems to crunch the same data over and over, and infact would be good for this project. New servers in a new location on the net, and only testing with enough work to last less than 10 minutes and takes days to come back isn't really a valid test for when a real dump of work becomes available, IMHO. Nows the time to really test the infrastructure, and not when real data might be lost/delayed. |
35)
Message boards :
Number crunching :
Even if there's no "real work", may I suggest a "real test"?
(Message 17366)
Posted 16 Jul 2007 by EclipseHA Post: Much has changed since there was real work in the pipline - the severs have moved, looks like project directories have changed, etc... There have been a couple of small bursts of WU's for testing, but, as many people have the resource for LHC set really high, when work is available, it goes to only a few machines. As a result, not many of the clients have been tested, and there probably hasn't been that much of a load on the servers. Last time there was "real work" most of the WU's I got were ghosts.. (like 15 of 20) How about using old data for a real test run before new data hits the pipline? I'm thinking on the order of 100K WUs. Work out stuff now before real data is comprimised, is all I'm suggesting... (on both the client and server end) |
36)
Message boards :
Number crunching :
Few test jobs in flight!
(Message 17083)
Posted 22 Jun 2007 by EclipseHA Post: with only 56 WU's issued today, I'll bet that <5 hosts got them all.. More like 2-3 hosts. You can tell as they're being completed VERY slowly. Is this really a valid test, as it's such a small subset of the user base. Maybe issue 50,000 OLD WU's and run the system thru it's paces before new data is crunched? That might almost be a valid test.. 50 sure isn't! |
37)
Message boards :
Number crunching :
How do we
(Message 16916)
Posted 18 May 2007 by EclipseHA Post: The idea is that to migrate you will have to do nothing! The best solution, but be forewarned that it could take a couple days for the changes to propagate. No big problem, but people might see some weird stuff during that time (some machines get to the new servers, while other dont, etc) |
38)
Message boards :
Number crunching :
The ghosts in the machine
(Message 16811)
Posted 4 May 2007 by EclipseHA Post: Well, this round I got 5 ghost WU's where for the last round I got 20 ghosts and one that was actually sent. (by ghost, I mean that per the website I have them, but the computer was never sent the WU and the logs show it). Does anyone else see this? (check "results" under "My Account") |
39)
Message boards :
Number crunching :
Good news and bad news.
(Message 16732)
Posted 24 Apr 2007 by EclipseHA Post: Got home today and found one of my systems crunching a lhc WU. It was the only one on that system. I checked the website,ad according to that I has 20 additional WU's on that machine. Check the log, and I can osee that only one was actually sent! Anybody else see a similar thing? The website thinks it sent many more than it actually did? (the 20 "lost" WU's show as if they're cool, and just waiting for results..) |
40)
Message boards :
Number crunching :
Because you asked....
(Message 16714)
Posted 15 Apr 2007 by EclipseHA Post: Didn't CERN just have a fairly major problem? I heard it on the news while driving.. (Paul Harvey, I think).. A week or so back.. The "new ring", where they used the the wrong unit's for calcs (like MPH/KPH but that's not it), but a 20 ton magnet kind of got wacked out of shape during a test? Could we have crunched the bad data? |
©2023 CERN