41) Message boards : Number crunching : New Work Units Suggestion (Message 18656)
Posted 2 Dec 2007 by Brian Silvers
Post:

LoL'z i dont think that they can predict as to which set of magnets will require further tuning.
Regards
Masud.


You mean they don't follow the little gremlins using demagnetizers as they run around the track? ;)
42) Message boards : Number crunching : Claimed 0 credits - still pending after all 5 results received (Message 18655)
Posted 2 Dec 2007 by Brian Silvers
Post:
I also have 3 WU's 0.00 claimed pending credits! One of those WU's http://lhcathome.cern.ch/lhcathome/workunit.php?wuid=1892389

Actually, you're claiming, from the top one to the bottom one, 0.00233318497995638, 0.00408711093157513 and 0.00235602975402844 credit.

You had three very fast tasks, of less than a second to slightly over a second.
If you check the workunit IDs, you see that everyone claimed so little credit for those tasks.


Blah... Both of you are "accurate", just you were more "precise".

At any rate, perhaps the recent issuing of results (that I missed) means that the project staff is back from whatever they were dealing with and can now perhaps address questions on this?
43) Message boards : Number crunching : work units?? (Message 18628)
Posted 20 Nov 2007 by Brian Silvers
Post:
Where have all the wu's gone??


10 Downing?
44) Message boards : Number crunching : Stil a pending credit (Message 18624)
Posted 20 Nov 2007 by Brian Silvers
Post:

I understand that there are 2 different causes to the "zombie" WU's.. I've got about 30 of the 0.00xx pending, and maybe 20 of the "long lost granted" WU's (and two that errored out long ago...).

In both cases, however, it does incdicate a cleanup is in order!


Actually, there are 3 causes (missing WU header, validated without assimilation, and quorum formed without validation).

I only reacted to your post because it seemed to me that "cleanup" means "delete" for you. If that is not the case, sorry...

Brian
45) Message boards : Number crunching : Stil a pending credit (Message 18612)
Posted 18 Nov 2007 by Brian Silvers
Post:

Here's one from 2005 that was granted credit:

http://lhcathome.cern.ch/lhcathome/result.php?resultid=694330

If you try to look at the WU, you get an error that says "workunit not found"


There is a difference between pending and orphaned. That resultID is an orphan, which is a different issue. This thread is speaking specifically about the 0.00xxx claimed credit units. The host 80808, which I assume is your host, has only 4 such results, all from October of this year. You will note that they are pending.

One should also take care to note that there is a difference between a result being orphaned and a result not being assimilated and purged. If a workunit header is found and the results come up along with it and the results have been validated and issued credit, that is a result that hasn't been through the assimilation process.

Like I said, I know for some that want to clean up their hosts, a delete is the "quick and easy" way from their perspective. From a scientific perspective, it should be in a different order. The way I'd do it?


  • Orphan ResultID: For every result that does not have a matching WU header, delete Result.
  • Validated WU that hasn't been Assimilated: Look at assimilator code to find out the cause and correct the cause.
  • 0.00xx claimed credit results that aren't validating: Look at validator code to find the cause and correct the cause.



Of course, all of this might be taken care of by the BOINC server-side code upgrade that still needs to be performed...

IMO, YMMV, etc, etc, etc...

46) Message boards : Number crunching : Stil a pending credit (Message 18604)
Posted 17 Nov 2007 by Brian Silvers
Post:

Hey, I got some of these 0.00xxx results from April.. If they're not in the DB now, they never will be!

In total, I got about 50 of these 0.00xx results, and some that were actually granted real credit back in 2005!

Housekeeping time!


If the results are listed as pending, this means that the validators have never looked at them. Since the validators never looked at them, they cannot have been fed to the assimilation process to be inserted into the science database.

Your computers are hidden, so I can't verify your claim as to results having been issued credit. If so, then you might could look to see if everyone else that was assigned that same WU has been validated / invalidated. Any pending status will probably hold up the process that handles transitioning / assimilation / purging.

As I said before, even though the tasks failed very quickly on our machines, the data collected still could be of some worth. It would be far better to get the results to pass through the system the "normal" way than to just flat out delete them. I understand it would be easier for some to have them deleted, particularly those who wish to merge / delete hosts.
47) Message boards : Number crunching : Stil a pending credit (Message 18600)
Posted 14 Nov 2007 by Brian Silvers
Post:

I now have 31 pending and all are showing the above problem (0.00xxxxxx claim).
Can these be validated or killed please?


I thought about suggesting zapping them all a few weeks ago, but then the thought occurred to me that the results may need to be inserted into the science database. Even if the result indicates an instantaneous containment failure, that's needed to be known so that it doesn't happen in the real-world application...

So, Neasan and Alex, what's up? Is someone working on getting the server upgrade performed? Would this by chance be some old validator code?

Brian
48) Message boards : Number crunching : Project encountered internal error: shared memory (Message 18557)
Posted 5 Nov 2007 by Brian Silvers
Post:
After the recent outage, I just tried a project update to report results and get new work and received the following:

6/11/2007 5:57:28 AM|lhcathome|Fetching scheduler list
6/11/2007 5:57:33 AM|lhcathome|Master file download succeeded
6/11/2007 5:57:38 AM|lhcathome|Sending scheduler request: Requested by user. Requesting 28700 seconds of work, reporting 7 completed tasks
6/11/2007 5:57:53 AM|lhcathome|Scheduler request succeeded: got 0 new tasks
6/11/2007 5:57:53 AM|lhcathome|Message from server: Project encountered internal error: shared memory

Problems?

Live long and BOINC!


Same here... Of course, I had nothing to report...and it said out of work, but hey, I'm stubborn like that... ;)
49) Message boards : Number crunching : Stand back....... (Message 18543)
Posted 4 Nov 2007 by Brian Silvers
Post:

If crunching LHC doesn't earn respect from LHC then do tell what we have to do to get respect. Stand on our heads and spit out gold ingots?


I said that I agreed with you in part about the IR issue and that a good middle ground was to do the project-side aborts.


What has LHC shared with us regarding how our donation is used? Seems about all we know is that we're helping to align the magnets. Wonderful! I'm sincerely glad to help but I think we deserve a few more details about IR when there seems to be no need for it to be as high as it is plus better server software.


The quote below, is exactly what you stated back in the "Because you asked" thread...


OK, so they post like "we're tweaking Sixtrack and the result verifier". Is that going to keep the k00ks happy? Probably not. So then the k00ks ask questions like "what are you doing to Sixtrack and the verifier". The team responds with technical details. Then the k00ks need to know what the technical details mean. And if the team doesn't respond with a crash course on BOINC Server the k00ks get all pissy that the team doesn't care and threaten to quit anyway. Give the k00ks the boot, cancel their accounts, they're just attention seeking kids that yammer on and on about nothing important just to yank people's chains. Yah, you got it, they're trolls.
50) Message boards : Number crunching : Stil a pending credit (Message 18534)
Posted 3 Nov 2007 by Brian Silvers
Post:
I hope nothing major breaks. I'd really be whizzed off if I lost my 0.00198303589808177 credit!

Hmmm... isn't that the Reverse Polish Notation of the square root of the cosine of PI????


Yeah yeah... My concern was about the result database having hundreds of thousands of extra rows that should have been transitioned already. Worst thing that would happen is a slowdown in performance and a longer backup time...

Now, go back to your bah-humbug thread over at SETI :-P
51) Message boards : Number crunching : Stil a pending credit (Message 18532)
Posted 3 Nov 2007 by Brian Silvers
Post:
with the WU's available over the last couple weeks, I now have about 20 "pending", where the claimed is of the 0.00xxxx variety.



They should probably deal with it sooner rather than later. Database growth has to be becoming a problem now...
52) Message boards : Number crunching : BOINC 5.10.28 (Message 18518)
Posted 1 Nov 2007 by Brian Silvers
Post:
I wouldn't be so eager to install this... I ran into a problem!

I have a single-user installation and installed over top of 5.8.16 after I had completed a task and had shut down the manager. After the installation, it said that the configuration was bad. I had not made a backup, so the only thing I could do is to try to install 5.8.16 again. The first install of 5.8.16 ended up telling me that there was an internal application error. The "repair" that was done when running the 5.8.16 install again has it to where I am now back in business...

These kinds of things are what made me stay away from 5.10.x in the first place...
53) Message boards : Number crunching : Stand back....... (Message 18502)
Posted 1 Nov 2007 by Brian Silvers
Post:
Edit:

Decided to remove reply...

While the thread title certainly still applies to the goings-on now, all the noise is amounting to a thread hijack, which is why I decided to just remove what I had posted in response...

54) Message boards : Number crunching : Two instantaneous crashes (Message 18498)
Posted 1 Nov 2007 by Brian Silvers
Post:
The LHC server seems to consider these two 1-second results a great success. Over - Success - Done. Exit code 0. I can find no crash or error messages or codes and the tasks didn't go through the Ready to report stage.

Maybe they were bewitched on Hallowe'en.


These are just results where the simulation determined that the beam could not make it around the track. Sometimes that happens in less than a second, other times it may take a minute or two... I had 4 of these out of my allotted 10 so far for today...

Now, if only they could get them to validate instead of sitting as pending...
55) Message boards : Number crunching : Stand back....... (Message 18496)
Posted 1 Nov 2007 by Brian Silvers
Post:

Fine, consider all these protest messages as us doing what we gotta do to teach them what they need to learn in order to run a respectful project.
<snip>

It may seem that way to you but to others it seems like they are now starting to take an interest in what's posted here rather than just laughing and walking away. Yes, at least that much has been accomplished, finally.

<snip>

Maybe a little taste of their own medicine will get us, the people who donate the CPU time that makes this project work, the respect we deserve.



You and others have expressed valid concerns, but I think that you should ease up a bit on the direct insults, but that's just my opinion.
56) Message boards : Number crunching : Stand back....... (Message 18482)
Posted 31 Oct 2007 by Brian Silvers
Post:

Just my opinion.


My opinion is, given your apparent connections to various BOINC resources, you could've conducted this discussion via mailing lists or by having Rom or David contact Neasan directly through email and/or phone to discuss the server upgrade. They could convey the urgency, if any, to Neasan directly and would be an "official" source.

I think these guys are doing infinitely better than what had happened with the prior administration, but...oh well, tis just my opinion...

57) Message boards : Number crunching : LHC BOINC Server Software (Message 18463)
Posted 30 Oct 2007 by Brian Silvers
Post:
It seems to me that LHC is running an older version of BOINC that's not as intelligent at allowing me to merge hosts as SETI and Einstein are. Are there plans to move to a newer rev any time soon so I can get my host stats lined up again?


There is talk of a server upgrade. Alex or Neasan (the admins) have posted about it a few times, but no specifics on when it may happen...
58) Message boards : Number crunching : Server not reporting "No Work from project" (Message 18407)
Posted 28 Oct 2007 by Brian Silvers
Post:
I understood your issue. I was merely pointing out that this subject is not reflective of your actual issue.

Anyway, to further explain what's going on in your case, the project has implemented a rule that says that no computer can contact the server more than once in any 30 minute and 18 second interval. If BOINC decides it is going to connect again in 1 minute, then I believe that your 30 minutes and 18 seconds are reset. In other words, if you were 20 minutes into the 30 minute wait and your system contacted the scheduler, you would then be told to wait 30 more minutes, not 10 more. This behavior is different from the 24-hour quota timer, which appears to keep track of specifically when the results were sent and adjusts accordingly.

I don't know if there is anything that can be done about BOINC deciding to automatically connect again in less than the set limit by LHC. It is a different issue than what I was originally reporting, that's why I'm suggesting that you create a new thread to discuss the issue.

Brian


OK, so I misread the POST header. But still the back off times were not for the reason of reaching the daily quota, I had see that before. But the systems were unattended and I was not controling the times of the WAIT between communication events, it was all on the software and the server. If I go back far enough in the logs I will find an item similar to the one shown above. When I looked at the web site after the last message I saw that there was work, about 90,000 units at that time and rising as I refreshed.

59) Message boards : Number crunching : Server not reporting "No Work from project" (Message 18404)
Posted 28 Oct 2007 by Brian Silvers
Post:


Edit: OK, I figure perhaps the 30-minute backoff was to try to get reporting done before forcing the 24-hour backoff. Dunno what to say. I'll check to see what happens with the 4 units I pick up over the next 2 hours, but I think the 30-minute rule doesn't get the final 2 results reported until tomorrow unless I hit the update button myself...


Yep, that's exactly what happened. I had one short running result, so it got reported on the contact that happened 30 minutes after the download of the first two results. The 2nd result finished at about 40 minutes after it was first downloaded, so the 2nd result went up in the report just a few minutes ago. However, here's what the scheduler did:

10/27/2007 8:41:44 PM|lhcathome|Sending scheduler request: To fetch work
10/27/2007 8:41:44 PM|lhcathome|Requesting 256151 seconds of new work, and reporting 1 completed tasks
10/27/2007 8:41:54 PM|lhcathome|Scheduler RPC succeeded [server version 505]
10/27/2007 8:41:54 PM|lhcathome|Message from server: No work sent
10/27/2007 8:41:54 PM|lhcathome|Message from server: (reached daily quota of 4 results)
10/27/2007 8:41:54 PM|lhcathome|Deferring communication for 22 hr 27 min 48 sec
10/27/2007 8:41:54 PM|lhcathome|Reason: requested by project

So, to report the remaining 2 results, I'll have to manually push the update button or wait until the next contact some 22 hours away...

Brian
60) Message boards : Number crunching : Maximum daily WU quota per CPU? (Message 18400)
Posted 27 Oct 2007 by Brian Silvers
Post:
Matched UK time tonite, but daylight saving time ends today in the UK - is it the same in Switzerland?


Switzerland doesn't matter now... The project servers moved to London a few months ago... And besides, it's based on UTC...

As for here in the US though, DST used to end tonight, but it ends next week here at least for a few years anyway. This will be the first year since I've been alive that it will not be completely dark at 7PM on Halloween...


Previous 20 · Next 20


©2023 CERN