41) Message boards : Number crunching : Is the validator doing it's job? (Message 7506)
Posted 7 May 2005 by Profile Markku Degerholm
Post:

> After starting the 4.35 client, he was doing the benchmark and started DL
> together, what he should not do this way.
> Finaly got several "computation errors" and had to reset the client.

I see from the database that the client has reported 144 Tera-FLOPS and 120 Tera-IOPS as benchmark values which are not bad at all for a mere PC:)
42) Message boards : Number crunching : Daily quota exceeded (Message 7504)
Posted 7 May 2005 by Profile Markku Degerholm
Post:
> Host # 22642 has received 40 units today and is being told it has exceeded
> daily quota. It appears the increase has had no effect. This host averages
> 85-100 results a day and will probably run out of work tommorrow for LHC
> unless it truly starts getting 100 a day as it only has 60 left in cache. It
> appears the bug is still present at the server side.

From the database I see that host #22642 has been assigned as much as 145 results today (server time). But it is indeed strange that the server has assigned that many results (100 is the current maximum) and why you have seen that few.

Investigations continue...

ADD: From the logs I see that there were many database errors while the results were assigned to you. The database actually crashed today for some unknown reason (first time in LHC@home history) but fortunately a recovery was successful. While database was sick many database updates failed and this can be now seen as abnormal behaviour.

I set nresults_today for host #22642 to 40. Happy crunching.

43) Message boards : Number crunching : Daily quota exceeded (Message 7503)
Posted 7 May 2005 by Profile Markku Degerholm
Post:
> Interesting. Either a) I'm wrong b) we are using too old version of the
> scheduler or c)Einstein project has done custom modifications.
>
> Guess I need to find out which:)

Case b) applies. Current CVS version of BOINC scheduler scales the quota according to number of CPUs but scheduler currently in use does not. However, we are testing new versions on the alpha site.
44) Message boards : Number crunching : More WUs on my results page than in my cache (Message 7497)
Posted 7 May 2005 by Profile Markku Degerholm
Post:
Yes it's probably due to database failure. The results were allocated for you but then something failed.

But don't worry, those results will be regenerated sooner or later and given out to be crunched again.
45) Message boards : Number crunching : ADMIN! - Project Down Errors & Message Board Errors (Message 7490)
Posted 7 May 2005 by Profile Markku Degerholm
Post:
Yes, we noticed that too. We are trying to improve the situation.
46) Message boards : Number crunching : Daily quota exceeded (Message 7447)
Posted 5 May 2005 by Profile Markku Degerholm
Post:
> That’s strange. According to Bruce Allen, on Einstein “The 'Daily Result
> Quota' is normally 8 workunits (per CPU, with a 4 CPU maximmum)”

Interesting. Either a) I'm wrong b) we are using too old version of the scheduler or c)instein project has done custom modifications.

Guess I need to find out which:)
47) Message boards : Number crunching : Web page - Too many Connections? (Message 7446)
Posted 5 May 2005 by Profile Markku Degerholm
Post:
During night in Europe the server goes trough set of backup, cleaning and optimizing tasks which may temporarily slow down the server such that overload happens. I hope we can get this smoothier but it might not be easy.
48) Message boards : Number crunching : Daily quota exceeded (Message 7430)
Posted 4 May 2005 by Profile Markku Degerholm
Post:
> Markku:
>
> "Increasing quota is problematic in the sense that some older machines can
> then download too much work at once (and possibly never finish it)"
>
> Isn't deadlines, benchmarks and estimated cpu time used to prevent this?

They should but I'm not sure if they actually work that way. Can anybody confirm..? In addition, estimates are only good as long as a BOINC CC is doing work 100% of time. If the BOINC client is shut down, there is no much that can be done.
49) Message boards : Number crunching : Daily quota exceeded (Message 7422)
Posted 4 May 2005 by Profile Markku Degerholm
Post:
> Hi!
>
> I might be beating a dead horse, but could you please raise the daily quota?
> Especially with these relatively short WU's, my dual-cpu PC can chew through
> more than it is allowed to download pr. day. Maybe you could set the limit to
> X WU's per CPU - not per host?

Per-CPU quota is not supported by BOINC so it's unlikely that we would do it. Increasing quota is problematic in the sense that some older machines can then download too much work at once (and possibly never finish it). But we'll think about it.

Can you give an estimate on how many workunits your dual-CPU machine can process in a day?
50) Message boards : Number crunching : constant HD activity? (Message 7403)
Posted 3 May 2005 by Profile Markku Degerholm
Post:
> Hm, I could swear the recent sixtrack fixed that.

It could be that frequency of writes have been changed, but I'm not aware of that. I guess I'll need to ask:)

> Before that, I could also see + hear my Linux Systems scratching their HD's
> every Second when running LHC, so I deemed the sixtrack Version responsible to
> the change.

It can also depend on the filesystem mount options. If you have "async" option enabled, the writes are buffered, while "sync" mode makes the writes happen immediately.

Most linux distributions usually have the asynchronous mode as the default, but many server systems use synchronous writes to ensure data integrity even in unexpected occasions such as power / hardware failure.

However, I guess you would know if the filesystem options were changed... So maybe there is something else.
51) Message boards : Number crunching : If only the numbers were a measure of reality!! (Message 7394)
Posted 2 May 2005 by Profile Markku Degerholm
Post:

> Average upload rate 2.9475960171044E+17 KB/sec
> Average download rate 2.0704415908107E+18

We were just testing with a new quantum-mechanics based file transport technology. It is based on principle of moving whole files directly from source HD to destination HD in one instance of time (instead of serializing the file into bits and moving them through net).

Just kidding :)

52) Message boards : Number crunching : Weird scheduler error (Message 7393)
Posted 2 May 2005 by Profile Markku Degerholm
Post:
That happens when database connection limits gets exceeded. There was a temporary server overload but it should be OK now.

53) Message boards : Number crunching : constant HD activity? (Message 7392)
Posted 2 May 2005 by Profile Markku Degerholm
Post:
AFAIK Sixtrack always did, and still does, write constantly to a number of work files, called fort.xx. This is done in addition to checkpointing file writing. Frequency of checkpointing can be controlled by the user, but frequency of these "ambient" writes cannot.

However, for modern operating systems this shouldn't be a problem. At least my Linux box buffers the writes and when running BOINC, it writes to the disk in about 5s intervals. When the machine is "idle" (not running BOINC), it still continues writing in about 5s intervals because there always seems to be something that needs to be written anyway.

Of course some could argue that keeping write buffers enabled is a bad idea for the sake of data security (if the power goes off, data in the buffers won't get written). For those, BOINC on RAM disk could be a better solution (at least CC 4.35 has an option to set another data directory). When using BOINC on RAM disk, I recommend having low amount of work buffering. I'm not sure how to set up a RAM disk in Windows, but in Linux that's relatively easy. Maybe somebody more experienced with Windows could help with the Windows part.

As somebody already said, I don't recommend using a flash-memory based device for Sixtrack data storage.

The best solution, of course, would be to fix Sixtrack to use only memory records instead of file records. Unfortunately that might mean rewrite of Sixtrack so don't hold your breath for that.
54) Message boards : Number crunching : Hurry up! (Message 7228)
Posted 26 Apr 2005 by Profile Markku Degerholm
Post:
> Thankyou very much

You're welcome:)

> A daily quota of 150, is this not just a little to high??

Well, most of the people have hit the limit of 50 already and there are yet some bad results. And people can control amount of work to download in the preferences pages. But if things start looking bad, I'll make a compromise and set it to 100.

> one client is having :
> (need to reset project?)

Yes, looks a bit suspicious to me. A reset might be in place.
55) Message boards : Number crunching : Hurry up! (Message 7226)
Posted 26 Apr 2005 by Profile Markku Degerholm
Post:
> There was a momentary failure in the server-side job submission system. So it
> seems that some workunits are without input files. We'll try to fix up the
> mess. Daily quota will be temporarily increased after that.

Invalid work units should have been cancelled now and daily quota is set to 150. Please report if there are further problems.

Edit: Internal tests show that problem remains... Investigating...

Edit2: OK, I think that's because there are still results of those cancelled workunits in the server buffers. It's easiest just to wait for them to fail... New results for the cancelled workunits shouldn't be generated anymore.
56) Message boards : Number crunching : Hurry up! (Message 7220)
Posted 26 Apr 2005 by Profile Markku Degerholm
Post:
There was a momentary failure in the server-side job submission system. So it seems that some workunits are without input files. We'll try to fix up the mess. Daily quota will be temporarily increased after that.


57) Message boards : Number crunching : Is the validator doing it's job? (Message 7206)
Posted 26 Apr 2005 by Profile Markku Degerholm
Post:

> As a side note & I've brought this up before also, why is there still (Not
> counting the over 150 that are Requesting Credit) 224 Pending WU's in my
> Account that show 0:00 Credit Requested. And every one of them that I looked

Good question. I'll investigate this sooner or later...
58) Message boards : Number crunching : LHC Project Demise ... ??? (Message 7197)
Posted 25 Apr 2005 by Profile Markku Degerholm
Post:

> I know that S@H links the project app version to the result when it is sent to
> the computer. I was under the impression that this was standard.

Well, I don't know a way to do that. Except than to create a new application entry for each new version (one can of course link a workunit to an application).
59) Message boards : Number crunching : some upload difficulties? (Message 7167)
Posted 22 Apr 2005 by Profile Markku Degerholm
Post:
>
> 22.04.2005 04:50:35|LHC@home|Couldn't delete file
> projectslhcathome.cern.chv64lhc.D1-D2-MQonly-inj-no-skew-5s6_8532.8846_1_sixvf_1447_3_0


Sounds like an open file problem. Maybe a virus scanner was inspecting it when it should have been deleted... Or maybe it is a core client bug, there has been lots of those "open file" errors.
60) Message boards : Number crunching : LHC Project Demise ... ??? (Message 7166)
Posted 22 Apr 2005 by Profile Markku Degerholm
Post:
In yet other words: we can only control minimum application version that can be used when handing out workunits. That is, we cannot say which version of sixtrack should be used for a given workunit. Only that a given workunit is to be processed using some application (always sixtrack, at least for now), and minimum version number of that application.

The reason that there can be multiple application version crunching at the same time is that after a workunit has been started with some application version, it will be used until the workunit is done (or fails).


Previous 20 · Next 20


©2024 CERN