1) Message boards : Number crunching : New MEGA! crunchers (Message 24980)
Posted 29 Nov 2012 by Profile jujube
Post:
The Agile Boincers are not as agile as they claim, as I have several PVs waiting as "no response" from them as wingmen.


Their computer/cluster or whatever their monster cruncher is may be dedicated to higher priority work that it must take care of first. BOINC might be something the Agile admin(s) allow only when there is nothing else for the machine to do. We've seen this type of situation before. Unfortunately when the machine has to switch over to their top priority work it sometimes doesn't get back to the BOINC tasks it has queued up until after the deadline expires. Would be nice if there was some way to automatically abort those tasks rather than just let them expire. Actually there is a way but the machine's admin(s) would have to script it. They might not want to do that. They might not know how. They might not even care. Of well, when a machine can crunch that many tasks you tend to kiss its feet rather than whine about the small stuff.
2) Message boards : Number crunching : WU not being resent to another user (Message 24979)
Posted 29 Nov 2012 by Profile jujube
Post:
I THINK what he is trying to say is why aren't the resends automatically sent immediately to the available workunit cache instead of just into 'some other cache' then into the available units cache?


No, you misunderstand, that's not what I was trying to say. I would try to unravel it for you but I don't think I have that much time.
3) Message boards : Cafe LHC : rogue BOINC project: ... - THREAD CLOSED (Message 24945)
Posted 7 Nov 2012 by Profile jujube
Post:
I think it is not fair to use the message board of a project to attack another project, since the volunteers of the attacking project know nothing of the other project's problems.


If they read my post then they will know something of the other project's problems. Do you have something against sharing facts? Would you prefer that everybody remain ignorant?

I have received a letter of thanks from CERN for doing this and I never complained about anything.


Ah, I see, they bought your silence with a letter of thanks. There were rumors that your fee for silence was 40 pieces of silver.

I do not want to start a flame war but I think people should know how thing really are at T4T.
Tullio


I agree! And that is why I posted here... to let people know how things really are at T4T.
4) Message boards : News : Status and Plans, Sunday 4th November (Message 24943)
Posted 6 Nov 2012 by Profile jujube
Post:
Good analysis, Richard. Appears that is exactly the way it went down.
5) Message boards : Number crunching : WU not being resent to another user (Message 24942)
Posted 6 Nov 2012 by Profile jujube
Post:
It's just a very, very busy project. They maintain a 'high water mark' of around 300,000 tasks ready to send, and turn over about 60,000 tasks per hour - so the end of the queue is never more than five or six hours away.


That sounds simple enough that it could be made to work here too. The trick is having the code to do it and setting the high/low water marks appropriately. For example, if Eric has a batch of 179,249 tasks, just to pick a nice not-round number, then he ought not to dump all 179,249 into the queue at once. There needs to be a high water mark of say 2,000 and a low water mark of say 1,000. The batch starts with the feeder, splitter, or whatever its name is, dumping 2,000 tasks into the queue. When there are only 1,000 tasks left in the queue the feeder/splitter/whatever dumps all the resends into the queue and follows those with enough tasks to fill the queue to the high water mark. Eventually all 179,249 tasks have been put into the queue with all the resends sprinkled in as well at the beginning of each top-up. There should be only a short tail which won't matter anyway because even if Eric creates another big batch the tasks in the tail will go into the queue first before tasks from the new batch.

Now the milllion dollar question is.... Is there server code available that does that? Is that code already installed? If so what are the names of the options/config items that make it happen?
6) Message boards : Number crunching : WU not being resent to another user (Message 24936)
Posted 6 Nov 2012 by Profile jujube
Post:
You are welcome. Good to see that my brain still produce good explanations when it is already nearly sleeping.


Christoph is correct. Surely other projects have encountered exactly this same problem. If there aren't server-side options to handle it then maybe there should be? WCG has been running multiple simultaneous sub-projects and/or queues for years and so have other projects. I can't believe they would just live with this in fact I know they're not because their resends take a matter of minutes to get out the door, not days/weeks. Somebody out there knows how to fix this, it's just a matter of asking around or digging through email list archives.
7) Message boards : Number crunching : WU not being resent to another user (Message 24921)
Posted 4 Nov 2012 by Profile jujube
Post:
That is also the reason why I personally think that the low water / high water feeder system is a good idea.


The low water / high water feeder system is definitely the best way. I overlooked the fact that if they start another batch before the tail runs out then further resends from the previous batch will go to the tail of the new batch not the tail of the batch they actually belong to. My bad, thanks for correcting me.
8) Message boards : Number crunching : WU not being resent to another user (Message 24914)
Posted 1 Nov 2012 by Profile jujube
Post:
If one has 6 studies, and one runs all six in parallel
not one will be complete for 6 weeks say; if one runs them one after
the other then one is finished in 1 week, one in two weeks, etc.
Which is better? Eric.


Better for you? It depends. If your hands are tied until all 6 are complete then it doesn't matter whether you run them in parallel or sequentially. If the data produced by any given study allows you to make progress in some other urgent endeavor then maybe sequential is better. If you're being pulled in a thousand different directions on any given day then it doesn't matter because you never have nothing to do anyway and would still be swamped even if you could work 24/7/365 like a robot.

Better for us? Doesn't matter what's better for us. We're here to donate and do what's best for you and the science. That's all that matters.
9) Message boards : Number crunching : WU not being resent to another user (Message 24911)
Posted 31 Oct 2012 by Profile jujube
Post:
This was discussed extensively and investigated months ago. It is indeed a 2 part problem as suggested in this thread: first the resends need to be marked as high priority (I believe there are actually 2 high priority attibutes one can set) and second the resends need to be inserted into the middle of the queue. Investigations of months ago indicated the high priority attributes are being set but the tasks themselves are being placed at the very end of the queue. The questions are: 1) is that really a problem (ie. does it really cause some genuine harm) and 2) is it worth spending the manpower it would need to fix it. I believe the answer to both questions is "no!" and my reasons follow.

1) Credits are worthless, 99% of crunchers know that and don't care if they have a big backlog of pending credits. Fix it if the admins/devs have nothing else to do.

2) There is a concern about "the tail" recently expressed by Eric. The tail has received a lot of bad press over the years and while I don't deny it exists I can't convince myself it is the unholy evil so many have made it out to be. I believe the tail effect causes harm at projects that generate tasks on the fly. Generating on the fly means that the results being crunched now are required to generate the tasks that will be crunched in the very near future. By very near future I mean within the next few hours. Obviously if results don't get verified more tasks cannot be generated and the train screeches to a halt. Tasks are not generated on the fly here so the tail doesn't have the dreaded effect it might have.

Also, when you think about it, a batch of tasks requires X CPU cycles to compute and it doesn't matter which order you crunch the tasks the whole batch will still require X CPU cycles. Therefore if you insert a resend into the middle of the queue it doesn't really affect the time required to complete the batch because the insertion into the middle delays the sending and completion of some other task in the queue. The only thing changed is which result gets verified first and that is irrelevant to production rate if not generating tasks on the fly. So in the end reducing pending credits is the only reason for inserting resends into the middle at a project that is not generating tasks on the fly.

Christoph has suggested resends are not being inserted into the middle due to some deficiency in the feeder (or something like that). The solution would then require more than just tweaking a setting or two, it would require revamping the feeder code itself. That might be very complicated. If all that is true and if I were the project admin/dev and was subject to the time constraints they seem to be subject to at Sixtrack, I would be treating the problem as low priority too because there is no real harm in a large pending credit number. Of yes, a few crunchers would pack up their toys and flounce off in a hissy fit but they are so few in number they would have negligible effect on production. Notice I said negligible effect not zero effect. Whatever the projects goals are they will likely be achieved well before any deadline so a handful less crunchers will have next to zero effect.
10) Message boards : Cafe LHC : rogue BOINC project: ... - THREAD CLOSED (Message 24899)
Posted 25 Oct 2012 by Profile jujube
Post:
I am a moderator at the Test4Theory project aka T4T, some of you already know me. We all know and accept the fact that every new project has problems in the beginning and we all know that good projects do what they can to fix those problems. One of the cardinal rules in the BOINC community is that a project absolutely must not steal CPU cycles you have allocated to other projects. T4T has broken that rule. The admins know they are stealing and have refused to do anything about it for over 6 months. Why? Because they like the extra CPU cycles and because they don't know how to get more CPU cycles through honest means.

Recently they proposed testing a new application that would alleviate the problem but they will not install the new server code their new application requires. Without the new server code the application will not function properly which will allow them to say "sorry, the new application failed the test so we must continue with the current application". Yes, they have sabotaged the test application.

At one time I thought they were making an honest effort to fix the problem and be a "good project". Now I see I was wrong. They have no intention of changing their ways and their lies, lame excuses and theft of CPU cycles from other projects will continue. I am ashamed to have been part of their project. I don't want my name associated with T4T any longer and I present this information to the BOINC community so that you can avoid being another one of their victims.

Exactly how do they steal CPU cycles? Their application is actually a wrapper that starts/stops/suspends a virtual machine that does the actual crunching. The wrapper starts the virtual machine but does not suspend it in spite of the fact BOINC manager indicates it has been suspended. Everything appears normal and you don't even know the virtual machine is still running unless you look very carefully. Most volunteers assume the project is honest so they don't check and look carefully. The virtual machine runs at normal priority which means it does not relinquish the CPU for your other BOINC projects or for most of your personal computing needs the way a normal BOINC task does. So do the math... the task doesn't suspend and it runs at normal priority... that adds up to a lot of CPU cycles stolen from your other projects.

A fix for this problem has been available for many months but they have steadfastly refused to implement the fix citing one lame excuse and lie after another. Avoid T4T like the plague.

I've placed this information on many project forums and will place it on many more. Some of you might call that spamming but I think what T4T is doing is atrocious and I am convinced it is necessary to spread the word as quickly as possible which means as many forums as I can. I am not a spam bot, I post manually. Obviously I can't return to all the forums to discuss the issue so if you want more info/discussion please drop in to the T4T forums. This is NOT a ploy to get you to join T4T in fact I am advising that you NOT join T4T.
11) Message boards : Number crunching : Inconsistent Report Deadline dates (bug or feature?) (Message 24860)
Posted 22 Sep 2012 by Profile jujube
Post:
Ummmm, what makes you think the staff knows how this works?

It's as I described... The real deadline is sent to the BOINC client and it was 18 Sep 2012 | 20:44:32 UTC for the task you mentioned in your previous post. However, the server allows an extra 24 hours in addition to the deadline sent to BOINC client. Of course that means the real deadline sent to BOINC client is not really the real deadline. The actual real deadline was 19 Sep 2012 | 20:44:32 UTC for your example task. In case the situation is not sufficiently bizarre at this point then realize that if the task is resent to a third host and if that host returns a valid result before your host does then you would get 0 credits but if you manage to get your result validated before the third host then you receive credit and the third host receives 0.

I can't understand why a deadline can't be a deadline. Why does it need to be so confusing? If a deadline of X days is not sufficient, if it needs to be X + 1 days then why not just make it X + 1 days?
12) Message boards : Number crunching : Inconsistent Report Deadline dates (bug or feature?) (Message 24849)
Posted 18 Sep 2012 by Profile jujube
Post:
I have a hunch this is what Eric was seeing when he mentioned BOINC client seems to recalculate the deadlines. I think the deadline shown in the client is the real deadline whereas the deadline shown on the website is the deadline plus the 24 hour grace period. Or is the deadline on the website the real deadline? If I was an Azimov style positronic brained robot I would be in roblock at this point.
13) Questions and Answers : Unix/Linux : Boinc 7.0.28 (Message 24826)
Posted 13 Sep 2012 by Profile jujube
Post:
Is 7.0.28 a repository version or a Berkeley version? The reason I ask is because one of the 7.0.x versions from Ubuntu repositories has a bug. Sorry, I don't remember for sure which version has the bug but I think it might be 7.0.28. If you tried to run it and had problems then that's likely why.

On SETI forums there is a discussion about the bugged version and advice on where to obtain a good 7.0.x version.
14) Message boards : Number crunching : Too many exits ? ! (Message 24817)
Posted 11 Sep 2012 by Profile jujube
Post:
There is a brief explanation of what error -226 means at the BOINC FAQ Service.
15) Message boards : Number crunching : Maximum elapsed time exceeded (Message 24812)
Posted 10 Sep 2012 by Profile jujube
Post:


Yes here is just one example of his task and the same one done by a host without the error.

http://lhcathomeclassic.cern.ch/sixtrack/workunit.php?wuid=3045175

His run time 404,248.05 (sec) vs the completed of 13,055.11 (sec) on that same task.




I noticed that too. His wingmen are completing the tasks in far less runtime than he is spending on them. Also his CPU times are far less than his runtimes which makes me wonder if he (or something) is severely throttling his CPU or severely throttling BOINC.

It's definitely not disk space exceeded, that receives it's own error message and is not lumped in with the elapsed time exceeded message.

16) Message boards : Number crunching : 30 Teraflops!!! (Message 24792)
Posted 7 Sep 2012 by Profile jujube
Post:
How many flops can 13,423 Sixtrack volunteers contribute?

World geography for 1,200.
17) Message boards : Number crunching : Why, oh why, oh why?? (Message 24773)
Posted 3 Sep 2012 by Profile jujube
Post:
A long as none of your tasks from any of your projects are missing deadline, what's wrong with tasks running at high priority? Yes, it suspends tasks from other projects but in the end, over the longterm, all your projects will get the resource share you specify.

If you cannot stand seeing tasks run at high priority then decrease your work cache to 0.1 days.
18) Message boards : Number crunching : Long WU's (Message 24763)
Posted 2 Sep 2012 by Profile jujube
Post:
Will that rule of thumb work for slow hosts with old P4 or Athlon64 processors? How about P3 machines? I mean you used the word "usual" which means it might not apply in all cases so are there cases where it's advisable to set the cache even smaller, for example for P3 or P4 era hosts? Perhaps for slow hosts it would be advisable to cut the cache size in half again?

I think the rule of thumb - which I quoted, but did not originate - dates from even before the era when P4s and Athlons ruled the world.


There have been many changes to the scheduler since then. Perhaps the rule of thumb needs to be revised.

Also, based on other volunteers' experiences and reports it seems to me your "usual" rule of thumb tends to fail when one of their projects issues tasks that are very much longer than their usual tasks. I'm not sure exactly why but that's seems to be how it works. Any comments on that?

All absurdly general assertions need an exception, and I think you've put your finger on it. That rather depends whether you regard Eric's occasional experiments as an excitement, warranting manual intervention and micromanagement: or whether you prefer a totally fail-safe configuration, where 'auto' mode can cope with every eventuallity. And it depends how reliable the other projects are, too.


I like to keep a close eye on BOINC but often I have to ignore it for a few days. I don't mind a little manual intervention but some volunteers have even less time for BOINC watching than I do and I can appreciate their desire for a totally fail-safe configuration. Alas, there probably is no such thing as a totally fail-safe configuration given the level of unpredictability (chaos?) when attached to several projects. One never knows what curve ball Project XYZ is going to throw at the scheduler next and that's why I recommend a very small cache of 0.1 days if one wants it to work as automatic as possible.

I also strongly encourage projects to make sure task deadlines suit the maximum duration of their tasks. In that regard, I do feel Sixtrack dropped the ball with the very long tasks issued recently. I would ask that the admin(s) issuing the tasks also understand how deadlines are configured on the server and be willing to "setup" the deadline before issuing a batch of unusually long tasks. Issuing long tasks then emailing the other admin a request to adjust the deadline accordingly is not the way to do it, if that's what's been happening. Warn the other admin first and don't issue the long tasks until the other admin indicates appropriate adjustments have been made. The left hand needs to know what the right hand is doing at all times. As I said, from the client's perspective there is a great deal of chaos in the system when you're attached to several projects so every project needs to make sure the info they send to hosts regarding their tasks is appropriate/accurate. Projects that cause problems get set to NNT very quickly and are returned to "active duty" not so quickly.

I would also like to remind Eric that BOINC does not revise the deadline, there was some speculation that it does. BOINC does revise the estimated duration of a project's tasks when it discovers a project's tasks are longer or shorter than expected. The deadline, however, is sacred and is never adjusted by BOINC client or server.

I'm a close observer, willing to step in and micromanage when needed: so when I saw a 'long' task waiting eighth or ninth in line on a Q6600 with 2 days' cache, I bumped it to start running next: your strategy would have alowed BOINC to do that by itself. Horses for courses.


Indeed my strategy did exactly that. The downside of my strategy is that if I lose my Internet connection for very long I run out of work rather quickly. Fortunately my ISP is extremely reliable and power outages in my area are extremely rare. I appreciate that some volunteers' ISP are very unreliable so a small cache may not be appropriate for them.
19) Message boards : Number crunching : Long WU's (Message 24756)
Posted 31 Aug 2012 by Profile jujube
Post:
Thanks for that; I think we shall have an interesting (long) discussion next week!

The usual rule of thumb isn't quite as drastic as jujube suggests:

Look at all your projects, find the one which had the shortest deadlines, and divide that deadline by the number of different projects you're attached to.

So, with deadlines here being 7 days, that's probably your shortest. If you're attached to 3 or 4 projects, a 2 day cache might be OK: if you're attached to 7 or 8 projects, don't set a cache above 1 day.


Will that rule of thumb work for slow hosts with old P4 or Athlon64 processors? How about P3 machines? I mean you used the word "usual" which means it might not apply in all cases so are there cases where it's advisable to set the cache even smaller, for example for P3 or P4 era hosts? Perhaps for slow hosts it would be advisable to cut the cache size in half again?

Also, based on other volunteers' experiences and reports it seems to me your "usual" rule of thumb tends to fail when one of their projects issues tasks that are very much longer than their usual tasks. I'm not sure exactly why but that's seems to be how it works. Any comments on that?
20) Message boards : Number crunching : Long WU's (Message 24745)
Posted 31 Aug 2012 by Profile jujube
Post:
Well I am not so sure about luck. As I said we shall be
looking at this next week and I'll get back to you (and others)


It has nothing at all to do with luck and everything to do with skill :-)

Volunteers need to learn that it is extremely difficult, if not impossible, to schedule tasks properly for projects that have huge variations in run times therefore it is of utmost importance to configure an extremely small cache of no more than 0.1 days. Volunteers also need to remember that it's not the end of the world just because a Sixtrack task goes into panic mode and suspends some of their other projects for a while. Panic mode simply borrows some time from other projects but those projects get paid back and the project shares the volunteer specifies will be honored over the long run. But, for that to work and ton ensure that the other projects' tasks don't miss their deadlines, volunteers absolutely MUST CONFIGURE A SMALL CACHE, ESPECIALLY IF THEY HAVE A SLOWER CPU.


Next 20


©2020 CERN