21) Message boards : Number crunching : Old work still stuck in (Message 8166)
Posted 23 Jun 2005 by Profile Markku Degerholm
Post:
> be careful with your SQL "update" commands! ;-)

I was, but not careful enough... This one was actually in C code, like:

snprintf(buf, sizeof(buf),
"UPDATE result"
" SET (stuff) ",
" WHERE id=%d", stuff);

See where it goes wrong? :) The one extra comma between SET and WHERE which wasn't even this clearly visible due to overlong row... Well, from this lesson I learned to always printf() the SQL queries before executing them on the production database...

22) Message boards : Number crunching : we're not resigning!!!! (Message 8165)
Posted 23 Jun 2005 by Profile Markku Degerholm
Post:
Thank you for your moral support:)

I'm hoping too to get some new work. There should be lots of work remaining in the studies but I'm not sure what's the problem this time. I'll try to get some new information.
23) Message boards : Number crunching : Old work still stuck in (Message 8141)
Posted 20 Jun 2005 by Profile Markku Degerholm
Post:
I invalidated these manually, I hope I got it right now (on the first try I crashed the whole database:( )
24) Message boards : Number crunching : Looks like they broke it (Message 8140)
Posted 20 Jun 2005 by Profile Markku Degerholm
Post:
Yup, I broke it. One comma in a wrong place made certain UPDATE clause to affect all results, with bad results.

I restored the result and workunit data from last nightly backup (about 12 hours ago). So some results have been lost, but nothing too bad I hope.

Sorry about this...
25) Message boards : Number crunching : Time to Completion (Message 8003)
Posted 7 Jun 2005 by Profile Markku Degerholm
Post:
I think the core clients computes the estimated computation directly from the "maximum number of integer/fp operations" value of workunit and benchmarked FLOPS/IOPS values of the host machine. The maximum operation count is very rough worst-case estimate. But if we find time to do some statistics to find correct values, maybe we will change them...
26) Message boards : Number crunching : New server binaries (Message 7868)
Posted 29 May 2005 by Profile Markku Degerholm
Post:
The server is now running with new BOINC scheduler components. They have been tested on the Alpha site but they still might have problems. If so, please report in this thread.

If I have understood correctly our physics team, new work will be released very soon (tomorrow I hope).
27) Message boards : Number crunching : please fix the deleter .... (Message 7867)
Posted 29 May 2005 by Profile Markku Degerholm
Post:

> I've got 4-5 others like this that have been around for months. I'm assuming
> they get stuck because of too many errors or something?

Yes, exactly. I'll make sure that they get deleted eventually...
28) Message boards : Number crunching : Is it Next Week yet ... ??? (Message 7866)
Posted 29 May 2005 by Profile Markku Degerholm
Post:
> LHC guide to time:
>
> Today = tomorrow
>
> tomorrow = next week
>
> next week = in 2-3 weeks
>
> this month = next month if you're lucky
>
> end of this year = first half of next year
>
> No offence at LHC staff meant! :-)
>

This seems to be a common trait in the IT business. Usually a predicted amount of required work gets exceeded by a factor between e (~2.72) and pi (~3.14)...
29) Message boards : Number crunching : please fix the deleter .... (Message 7834)
Posted 26 May 2005 by Profile Markku Degerholm
Post:
> system and the validator doesn't grant credit anymore...
I vote for
> modifications which will only delete the results when all deadlines have been
> passed OR all results have been recieved....
maybe sending out results

Actually that's how it should work already. We're not sure why it doesn't. Let's hope scheduler update will fix it.

> only 3 times instead of 5 times could fix this? this way only 3 valid results
> would let the delete job do it's work....


Well, redundancy has to be higher than required amount of identical results, otherwise the work "tail" will be too long. But let's see...

30) Message boards : Number crunching : Projects Computing Power (Message 7753)
Posted 19 May 2005 by Profile Markku Degerholm
Post:
> Hey Markku,
> May I ask how LHC's computing power compares to what you had originally
> planned in house for the sixtrack program?I know we have a only a bit over
> 5000 users and 15000+ hosts at any given time at present. Estimates of a
> million work unit run with 5 million results are that it would last us about a
> month or so as posted on these boards seem generally accurate. May main
> question is how does that compare timwise crunching compared to supercomputers
> you might have used or clustering with same amount of workunits?

We have currently about 2500 active users and 5000 active hosts. It's more than with any other computation platform we can use for Sixtrack, but I'm not sure about the details.
31) Message boards : Number crunching : Will this test stop next?-No, ... ? (Message 7680)
Posted 13 May 2005 by Profile Markku Degerholm
Post:

> Just something to keep in mind. Adding another 5K users doubles the population
> but may increase server load by the square rather than by the double ...

True. The major bottleneck on the server side is database. Other components can be distributed on separate servers, but then all of them need to use database... And database doesn't scale that easily, at least that old version of MySQL we are using. When comparing to many other database applications, BOINC does lots of database updates. Thus database replication doesn't give that much performance boost (but it complicates matters quite a bit) because all updates must be done on every database server.

Anyway, there are plans to increase number of users. However, I'm not sure when that's going to happen. We'll see.
32) Message boards : Number crunching : WU with granted & pending credit (Message 7642)
Posted 12 May 2005 by Profile Markku Degerholm
Post:
> Looks to me like a bug in the validator...

No it's not a validator bug. Validator can only validate if it has the canonical result for comparison.

It's just the over-eager file deleter program which does its work correctly in principle, but because of high redundancy currently used, some results will always be delivered after the other files have been deleted.

I'm planning to put a little delay into the file deleter. I hope that will be ready before the next big pile of work.
33) Message boards : Number crunching : Will this test stop next?-No, ... ? (Message 7641)
Posted 12 May 2005 by Profile Markku Degerholm
Post:
> How long will the work units be, 100.000 or 1 million turns?

AFAIK, 100 000.
34) Message boards : Number crunching : Will this test stop next?-No, ... ? (Message 7638)
Posted 12 May 2005 by Profile Markku Degerholm
Post:
We are currently finishing "tail" of current study, that is, resubmitting work until every workunit has been processed.

After that there will be a short break (week or so) and then about one million new workunits (which makes about five million results) will be submitted.
35) Message boards : Number crunching : RAC (expavg_credit) being recalculated incorrectly by LHC (Message 7616)
Posted 11 May 2005 by Profile Markku Degerholm
Post:

> what did you guys change?? :)

Nothing. We have a bit outdated scheduler with some RAC-related issues but we are testing the new scheduler on the alpha site. Let's hope the new version does it right.
36) Message boards : Number crunching : WU with granted & pending credit (Message 7615)
Posted 11 May 2005 by Profile Markku Degerholm
Post:
I'm afraid these results will never be granted credit. It's because the work unit has been already assimilated and the related result files (including canonical result) have been deleted or moved. However, these results will disappear from the "pending" list when the database entry is also deleted from the database.

I admit that it's unfair that you don't credit for these successfully completed results. We will try to find a way to prevent this in future.
37) Message boards : Number crunching : Downloading too many WUs (Message 7576)
Posted 10 May 2005 by Profile Markku Degerholm
Post:
> So is this recent influx of WUs a rush to send out as many WUs as possible to
> test something so that resource shares and client uptimes aren't taken into
> account? Ot is this a bug in the scheduler?

If it's a bug, it's a bug in the core client. The scheduler only has a maximum limit (quota) for results to download. It's up to the core client ask for proper amount of work. Which version of core client you are using?
38) Message boards : Number crunching : ADMIN! - Project Down Errors & Message Board Errors (Message 7540)
Posted 8 May 2005 by Profile Markku Degerholm
Post:
>
> Now my question:
> Where can I apply for that job? Will I be paid in LHC turns? WU's? Credits? or
> plain boring money? ;-)

Everybody answering to questions of other users are already doing the job:) The pay comes in form of good will and knowledge of sharing knowledge. :)

And I'm happy as long as people don't fight/flame others or discuss totally unrelated matters here. That is, as long as I don't need to delete any posts, all goes well. :)
39) Message boards : Number crunching : Is the validator doing it's job? (Message 7537)
Posted 8 May 2005 by Profile Markku Degerholm
Post:
There was some weird error with the database, preventing validator doing its job. The error should be now fixed and the validation of all unvalidated results is currently in progress, at least I think so.

It seems that even during the database misery database updates were done successfully so no data (result) loss happened. If you disagree, please report here.
40) Message boards : Number crunching : Daily quota exceeded (Message 7532)
Posted 8 May 2005 by Profile Markku Degerholm
Post:
>
> Where is the rest?
>
I think they are in the same place as many of jrenkar's results. That is, in the graveyard of lost results. But fortunately they can be resurrected so don't worry.


Previous 20 · Next 20


©2024 CERN