21) Message boards : News : Server upgrade (Message 37647)
Posted 18 Dec 2018 by dduggan47
Post:
I assume they are the 217 people that have chosen to publish there statistics.


Yup, but the vast majority (I'm guessing but I'm willing to wager!) of the other 8,669 are those who did not know there was a choice to be made. :-)

Not blaming LHC for this. It's my favorite project and I understand you (not of course meaning "you" personally, Toby) gotta do what you gotta do. It seems though that overall BOINC stats may be entering a new era, one in which they are extremely incomplete.

I wonder if non-EU based projects will need to comply with this. I'm currently running 45 projects (of which maybe half a dozen may not be active and I should prune). I went through all of them and it appears that only LHC, LHC-dev, Einstein, and WUProp have the GDPR (aggregation site export) option in their preferences.

I find this regulation strange in that it doesn't seem to prevent aggregation sites like BOINCSTATS or anybody else from accessing the data since it's available here for all of us to see. It just makes it harder, especially across a gazillion projects. Anybody who wants the data for some nefarious reason could get it so there's really no privacy issue.

Oh well. Life goes on but BOINC, without good stats, won't be quite what it has been. PITA. Sigh.

- Dick

Edit: Albert too which figures but it wasn't up earlier.
Also WCG but that one is opt out so it hasn't had a large effect.
22) Message boards : Number crunching : Too Many Total Results (Message 36918)
Posted 28 Sep 2018 by dduggan47
Post:
With LHC tasks the monitoring is itself a problem. They're not like other projects' tasks. Here "verified" doesn't always mean a successful result. With ATLAS, for example, you can spend thousands of hours crunching and have every result verify and think you've done a ton of useful work and then discover that in spite of the tasks verifying they did not return a HITS file and therefore did nothing but waste electricity. So monitor but do remember you sometimes need to look beyond what it says in the Status column in your results list.


I knew that ATLAS could return a worthless "valid" result and (with your help) learned how to check them. I've stopped running ATLAS until I have more time to work on it.

I didn't know that the other virtual projects also could do that. Can you tell me how I can check them more thoroughly? Is that also done through PanDa?

Not sure. It depends on what your "other purposes" are. You haven't mentioned it and nobody has asked. It might make more sense to just forget about any LHC VBox app and just do Sixtrack and other projects' apps that can handle being suspended.

In the end it's your decision, your power bill and your CPU time so it only has to make sense to you.


By "other purposes" I just meant that I use this device during the day for other things, browsing, spreadsheets, etc. I just need the make the performance not too clunky. That's why I'm attacking this as a fine tuning task. (I have another machine that's beefier and I run 6Track on it but I've never been able to get it to run anything virtual. Another project for the mythical day when there's more time.)

As for "sense", i was looking to see if I so clearly was misunderstanding technical aspects that I was making no sense in that respect.

Once again, thank you for taking the time to educate me.

- Dick
23) Message boards : Number crunching : Too Many Total Results (Message 36914)
Posted 28 Sep 2018 by dduggan47
Post:
Thanks, Bronco.

Yup, got it. I didn't respond in sufficient detail. I wasn't planning to turn on "Suspend if computer is in use". I was thinking about the "GPU computing" (and maybe the "computer is on battery", but that's a different issue).

AiUI BOINC doesn't use a GPU I figured I could turn that one on with no ill effects and I'm pretty sure it will help system performance I figure to try it and monitor it.

If that works I may also try and test the battery suspend. That won't affect performance, but the machine is rarely on battery so while there may stop & resume from time from time, it won't happen frequently, probably less than once / week. On those occasions though it would be better to have BOINC quiesced if possible.

I know that Toby gave me a performance work around too, changing the "Use at most..." (CPU time I presume), but I'd to see how high I can keep that and still have the machine functional for other purposes. I wish it were a dedicated box, it just isn't. I've experimented with the GPU use in the past and it seemed to make a difference. The plan is to try it and monitor it to make sure it doesn't cause problems.

That make sense?

Mechanic, I have my settings the same as yours except that the Use at Most CPU is at 90%. I may have to lower that further but my goal is to let it rip as much as possible and that's why I'll experiment test with the GPU suspension.

Thanks,

- Dick
24) Message boards : Number crunching : Too Many Total Results (Message 36911)
Posted 27 Sep 2018 by dduggan47
Post:
That seems to have done the trick, Toby. I'll experiment with the setting and find which one(s) cause the problem. My best guess is the "computer in use" option.

One more question. Was there any kind of a change in the operation of these tasks somewhere around 9/20? That's when these errors seem to have begun. Of course it's possible I fiddled with something that caused it even though I don't remember doing so. Just curious. the fact that others haven't reported this (that i've seen) probably indicates I'm guilty.

Thanks again for your help.
25) Message boards : Number crunching : Too Many Total Results (Message 36817)
Posted 22 Sep 2018 by dduggan47
Post:
Thank you, Toby. I'll try that.
26) Message boards : Number crunching : Too Many Total Results (Message 36812)
Posted 22 Sep 2018 by dduggan47
Post:
Not sure if the title lIne is the most relevant piece of information but in the last last week about half of my tasks in LHCb, CMS, and Theory are having this problem. (I'm not running Atlas.)

Here's an example.

https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=101479827

Any ideas are appreciated.

Thanks,

- Dick
27) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36068)
Posted 26 Jul 2018 by dduggan47
Post:
Thanks, AurRx. Glad to hear it's not just me!
28) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36066)
Posted 26 Jul 2018 by dduggan47
Post:
Thanks, Yeti. You're right, I'll do that. I've looked at it before but haven't gone back to it in a while.
29) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36063)
Posted 26 Jul 2018 by dduggan47
Post:
dduggan47 wrote:
What happens is that 1 single core LHC project starts up and nothing else leaving 7 cores sitting on their hands. That's the behavior I don't understand. My little machine is not being fully utilized by BOINC in general or LHC in particular and the latter seems to be the bottleneck for some reason.

Boinc-client isn't good regarding MultiCore-Apps. You have to make a lot of micro-management to get things running together as you want. It is not a bad idea to run only 1 kind of MultiCore-Projekt, e.g. Atlas or others


Yup, I think you hit it. I continued to experiment and found that the behavior I described is not consistent. It consistently prevents more than 2 cores worth of LHC projects to run but after suspending and unsuspending projects or tasks often enough BOINC suddenly started letting other tasks run on the idle cores. Just for grins I renamed the app_config.xml file and suddenly it will run the LHC tasks it can.

I'm pretty near the end of the time I'm going to spend on this (unless I or somebody else has a new thought on it). I'm probably going to give an ATLAS task one more shot and if it doesn't work I'll just disable ATLAS. All the others seem to run fine.

Thanks again to all who have taken their own time to try to educate me on this and help me figure it out. It's a great community!

- Dick
30) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36060)
Posted 26 Jul 2018 by dduggan47
Post:
Thanks for the quick response, computezmle. I've been through several cycles of reading and rereading your response, writing and rewriting mine, and experimenting a bit more. Here's where I am.

1) Are these setting redundant?

Not exactly.
The website setting only affects multicore apps, in this case ATLAS and Theory.
It configures the #cores to be used by the BOINC client's reports/calculations, the working set size which your local client needs to estimate if an additional task can be started and the RAM setting for your vbox VM.
In addition it calculates the computer's GFLOPS value.

Some values (avg_ncpus, nthreads, memory_size_mb) sent by the server can be overwritten by an app_config.xml, others not, e.g. the working set size.

It's recommended to keep the website setting in sync with the app_config.xml.


That makes sense and it's what I'd assumed. I do have them in sync.


2) Do they limit the number of CPU's for one task

The setting is for 1 task => 3 2-core tasks use a total of 6 cores


OK, IIRC that means it ought to be running however many tasks it can as long as my total number of cores (8) is not exceeded, right? In the case I described, 2 more 2 core tasks should have started bringing the total to 7. Another single core task could have started if I'd had one waiting.

3) Do they have any effect on what else can run

Yes but it's your BOINC client that keeps track of your total ressources (cores, RAM, network access, ...)


I don't understand why they don't start...

Did you set any limits in your client GUI or via app_config.xml?


The only app_config.xml file I have is in the LHC folder (c:\users\all users\BOINC\projects\lhcathome.cern.ch_lhcathome). I initially put it into the BOINC folder by mistake but then moved it and have had the manager reread the config files.

As for the GUI, the preferences are set to 100% of CPUs.

If I suspend LHC then BOINC immediately starts polling other projects ...

This is normal client behaviour and independent from LHC.


Yup. I just included that to show it worked normally if I suspended BOINC. With LHC unsuspended BOINC not only didn't start the additional LHC tasks, it wouldn't ask for tasks from any other project even though I had 5 more cores available for work.

At this point, since I fiddled a bit to see what happens, I've got 8 tasks running from other projects, 6 NFS and 2 WCG. I suspended NFS and WCG and my expectation was that 7 core worth of LHC tasks (3 2 core and 1 1 core were waiting) would start up. What happens is that 1 single core LHC project starts up and nothing else leaving 7 cores sitting on their hands. That's the behavior I don't understand. My little machine is not being fully utilized by BOINC in general or LHC in particular and the latter seems to be the bottleneck for some reason.

I understand that this can happen if, for example, I have a task needing 8 CPUs on top of the priority list and waiting. BOINC will hold off on starting anything else until the high priority task can get what it needs. Same with a 2 core task if 7 of my 8 are already in use. As non-technical as I am, I get that. That's not what's happening here though. I've got idle cores crying for work! Well, actually they're not crying, just idle. I'm doing the cry^^^whining :-)
31) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36058)
Posted 26 Jul 2018 by dduggan47
Post:
Questions re # CPUs:

I've got the number of CPUs set to 2 at the website and in the app_config.xml.

1) Are these setting redundant? If not, what's the difference in the effect?

2) Do they limit the number of CPU's for one task or the total number of CPU's in use for all LHC tasks?

3) Do they have any effect on what else can run (i.e. tasks from other projects)?

Here's the reason I'm asking. Right at this moment I have 3 running tasks (not including non-intensive and GPU), 1 LHC and 2 from another project, all single CPU. I have 4 ready to start tasks. All are LHC 2 CPU tasks, 2 ATLAS and 2 Theory.

What I expected is for 2 of those 2 CPU tasks to be running to bring the total CPUs to 7. I don't understand why they don't start and why they seem to be blocking other projects from running. If I suspend LHC then BOINC immediately starts polling other projects looking for work and starts those tasks up to the max of 8.

Thanks,

- Dick[/list]
32) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36011)
Posted 23 Jul 2018 by dduggan47
Post:
Oops. Good point.
33) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36009)
Posted 23 Jul 2018 by dduggan47
Post:
1) What did you change to allow 8 CPUs? Did you change it in your website settings? In the app_info.xml? Or both?
2) Did you create the app_info.xml? ...

@bronco
Again: Please, don't use "app_info.xml" in this context. It's "app_config.xml".


Yup, figured out about the name.

I was unsure where to create the file but went back to the other threads and found it, C:\Users\All Users\BOINC\projects\lhcathome.cern.ch_lhcathome. (Windows 10 doesn't make it easy to find that sucker. All Users is considered a system file and by default doesn't show up in Explorer or Command Prompt until you change the default view (although you can get to and its children it via the prompt if you know the exact path). At that point it shows up in Explorer but still not in Command Prompt. I think I'll call Bill Gates.)

So, once i did that it was found by having BOINC read the config files. The file is as shown in another thread:

<app_config>
<app_version>
<app_name>ATLAS</app_name>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<avg_ncpus>2.0</avg_ncpus>
<cmdline>--nthreads 2 --memory_size_mb 4800</cmdline>
</app_version>
<project_max_concurrent>1</project_max_concurrent>
</app_config>

I've changed the # of CPUs on the website back to 2 which you suggested as the goal.

BTW, I noticed another oddity on the LHC website that the admins there might want to know. When I gave up on this earlier I told it no more ATLAS and forgot I'd done it. Didn't matter, I kept getting tasks. Just allowed it again.

- Dick
34) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 36003)
Posted 23 Jul 2018 by dduggan47
Post:
202778167 seems to have worked. Of course I've thought that before. Panda says "finished" though.

The only change I made was to allow 8 CPU's.

I'm very much over my head with this but I'm trying and I appreciate everyone's patience.

- Dick
35) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 35992)
Posted 22 Jul 2018 by dduggan47
Post:
Thanks, bronco. Sigh.

I let the 2 still running complete just in case but I've changed my preferences to turn off Atlas.

If you or anybody else has any thoughts on what i might do to solve the problem, please pass them along and I'll give it another shot.
36) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 35988)
Posted 21 Jul 2018 by dduggan47
Post:
The change from 50% to 70% of RAM when the CPU is in use had no effect. Still got the HITS error.

My next thought was to modify the app_config.xml. That file doesn't seem to exist on my computer. If I should create it, where should I put it? Even though I worked around it (as you'll see below) I'd like to know for future references where it is/should be.

Meanwhile I modified the # of CPU's from 2 to 3. Task 201884680 did not have the error! I'm pretty sure it's that's the first good one I've had. I'd suspended 2 other tasks before they started to see how that one came out and have now released them. If they work then problem definitely solved!

Thanks all, but especially bronco, for the help.
37) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 35976)
Posted 20 Jul 2018 by dduggan47
Post:

You are the earliest volunteer who is still returning results regularly so I thought the honor of being my first guinea pig should go to you. It's not much of an honor but it's the best I can do :)
Glad to see you're still dedicated after all these years.


At my age I'll take honors wherever I can get them. :-)
38) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 35973)
Posted 20 Jul 2018 by dduggan47
Post:
... became aware of this today when I got an email from an admin letting me know.

Which admin and (more interesting as it may be helpful for others also) what was his/her suggestion?


Actually it was Bronco who answered above. His private advice was to come here and ask. :-)


Yes, it was me and I contacted you via PM not email. Also, I am not an admin (maybe). But those are minor details.

As you said in your op, you wanted to do more. I do too but at the moment I don't have additional computing resources to devote to this great project. What I do have is time and a suspicion that a great number of ATLAS tasks are returning no useful work in spite of the fact that they validate. So I created a script that grabs the host IDs, result IDs and result pages from a range of user IDs and analyses their ATLAS results (if any). It counts CPU and run time for all ATLAS tasks and CPU and run time for ATLAS tasks that don't return a HITS file and saves a list of user IDs returning "no HITters" to disk . I ran the script on user IDs from 67 to 10,000 and discovered that 7% of run time spent on ATLAS tasks is a total waste.

Then I reasoned that if I could get that 7% down to 1% it would get more even more additional work done than me buying another computer to devote to LHC (which isn't going to happen with RAM as expensive as it is and the huge requirements of LHC tasks other than sixtrack) . I composed a "form letter" to send to user IDs that are returning ATLAS results sans HIT files. So far I have sent that letter only to dduggan47 for a number of reasons:

1) I wanted to do at least 1 test run of the form letter to help gauge user reaction to it
2) I'm still learning how to automate sending the form letter
3) don't want to create a flood of angry users seeking advice and wondering why the hell this project validates useless results and leaves them with the impression that all is OK when it is not, a trickle of angry users seems easier to manage and perhaps more effective


OK. Just FYI, what I got was an email which contained all the text of your message (not just a link to the board). The email's subject was "[LHC@home] - private message". The from address was
"Admin.Lhcathome@cern.ch via cern.onmicrosoft.com ". In the text it said "From: bronco (ID 569117)".

As you do more of these I suspect you'll find others interpreting that as an email from an admin who goes by bronco.

Anyway, your test worked and you did not create an angry user. I'm happy to be your ginnea pig. :-) Using the information you and others have provided I'll either fix the problem or skip ATLAS.
39) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 35970)
Posted 20 Jul 2018 by dduggan47
Post:
... became aware of this today when I got an email from an admin letting me know.

Which admin and (more interesting as it may be helpful for others also) what was his/her suggestion?


Actually it was Bronco who answered above. His private advice was to come here and ask. :-)
40) Message boards : ATLAS application : Non-zero return code from EVNTtoHITS (65) (Error code 65) (Message 35969)
Posted 20 Jul 2018 by dduggan47
Post:
Thanks, gyllic. I had actually run across that in my searching and I'll add it to my list of things to try.


Previous 20 · Next 20


©2024 CERN