1) Message boards : ATLAS application : Bad WUs? (Message 45820)
Posted 8 hours ago by Profile Yeti
Post:
maeax wrote:
PC with one CPU (Virtualbox 6.1.12) have no problems so long.
All with faulty are using 2 CPU's (Virtualbox 6.1.30).
so the question seems to be: is the problem connected to the VBox version or to the number of CPUs used ???

For me it happens on VBOX 6.1.16 AND 6.1.30, they ran fine formerly for days (6.1.30) or month (6.1.16)

And I used the same number of cores in the past and the same number of simultan running WUs
2) Message boards : ATLAS application : Bad WUs? (Message 45814)
Posted 9 hours ago by Profile Yeti
Post:
I haven't seen it yet on native ATLAS.
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10697859&offset=0&show_names=0&state=4&appid=

It seems, as it damages the VirtualBox.

I have seen two different problems:

A) VMs running endless with less than 1% CPU-Usage
B) VMs get suspended after 10/20/30/40 Seconds, they are "unmanagable". This spreads over all my systems and different VirtualBox-Versions.

Today until now I had to abort 56 tasks
3) Message boards : ATLAS application : Bad WUs? (Message 45811)
Posted 13 hours ago by Profile Yeti
Post:
Okay, for me time to take an outage from Atlas.
4) Message boards : ATLAS application : Bad WUs? (Message 45810)
Posted 14 hours ago by Profile Yeti
Post:
This Morning I had to to cancel more than 15 WUs hanging around.

Sorry, but it sucks !
5) Message boards : ATLAS application : Bad WUs? (Message 45808)
Posted 1 day ago by Profile Yeti
Post:
just now, I got the next one.
So this was the fourth one since last night. I am afraid more will follow :-(

I'm shure more will follow !
6) Message boards : ATLAS application : Bad WUs? (Message 45806)
Posted 1 day ago by Profile Yeti
Post:
The Rate of this failure has raised since yesterday to more than 10 for me :-(
7) Message boards : ATLAS application : ATLAS long simulation 1.00 (Message 45058)
Posted 16 Jun 2021 by Profile Yeti
Post:
Hi all,

I have paused the submission of long tasks for the moment, since there are very little hosts running them and the large cluster running them previously is no longer running BOINC. But we may bring the long tasks back in the future if there is demand for them. Thanks to everyone who helped testing and running these tasks.

David

David,

if you bring them back, please make them available for users running normal Windows/Linux clients.

Thanks

Yeti
8) Message boards : Number crunching : VM Applications Errors (Message 44823)
Posted 26 Apr 2021 by Profile Yeti
Post:
LHC@Home is not a plug and play project like other BOINC-Projects are.

You can easily run LHC@Home like a plug and play project: if you run Sixtrack only
You can easily run LHC@Home like a plug and play project: if you run one of Atlas / Theory / CMS exclusiv and if you keep this setting: "Use at most 100 % of CPU time" (VMs don't like this kind of throttling)

If you want to run all kind of applications LHC@Home offers, you will have to make mikro-managing with your client; BOINC will not be able to always give you what you want for your client.
9) Message boards : Number crunching : GPU advertised for LHC, but they don't do it? (Message 44678)
Posted 8 Apr 2021 by Profile Yeti
Post:
I'm crunching these wus with my little Rx550 in 20/25 minutes.

My RTX2080 finishes them in less than 2 minutes, even more than one at the same time
10) Message boards : Number crunching : VM Applications Errors (Message 44430)
Posted 3 Mar 2021 by Profile Yeti
Post:
Thanks for your reply, that's a very large post of yours you linked, is their a particular part I can get away with reading? I don't want to spend hours on this.
HM, If you really want to to crunch Atlas, Theory or CMS, you really need to go through the the list point by point as I already mentioned there:

Please, check this list and be sure to check really all Details, step by step, all are important.
...
11) Message boards : Number crunching : VM Applications Errors (Message 44428)
Posted 2 Mar 2021 by Profile Yeti
Post:
Perhaps you find informations in this checklist
12) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 43922)
Posted 15 Dec 2020 by Profile Yeti
Post:
TCP_Tunnel is because of https - connection
[quote]squid TCP_Tunnel
xx.yyy.xxx.yyy 3128 - - [13/Dec/2020:18:29:33 +0100] "CONNECT lhcathome.cern.ch:443 HTTP/1.1" 200 58982 "-" "BOINC client (x86_64-pc-linux-gnu 7.16.6)" TCP_TUNNEL:HIER_DIRECT

WCG show the same info:
xx.yyy.zzz.xxx 3128 - - [15/Dec/2020:11:56:27 +0100] "CONNECT www.worldcommunitygrid.org:443 HTTP/1.1" 200 5980 "-" "BOINC client (x86_64-pc-linux-gnu 7.16.6)" TCP_TUNNEL:HIER_DIRECT
13) Message boards : Number crunching : Running Benchmark (Message 43905)
Posted 14 Dec 2020 by Profile Yeti
Post:
@Magic:

Go to your favorite project (Einstein, you wrote further on) and change anything, you even can change something with the same value. Important for you is, that really say save afterwards.

This will set this project being responsable for your settings, then lat all your clients update this projekt and then all should be again as you like.

@All:

If I understoord TACC right, it is not a normal BOINC-Projekt, but more like a BOINC-Manager. They want to give users the chance to set up BOINC and TACC and then forget it. TACC will coordinate, what th clients will crunch and more.
14) Message boards : Number crunching : Not getting any tasks, though many are available (Message 43856)
Posted 11 Dec 2020 by Profile Yeti
Post:
Shut down the PC so that it switches off.

Stay off for 1 minute !

Switch on and try again.

If this doesn't help, I would uninstall complete BOINC, remove all BOINC-folders, afterwards reboot the machine and then try a clean install
15) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 43820)
Posted 10 Dec 2020 by Profile Yeti
Post:
I could cache something like 64GB's worth of things in ram, I was thinking to get some Optane DIMMS, then I could take my host upto 640GB of memory.
Toby,

sorry, this makes really no sense. I was on the same trip as you and computezmle told me, not to do so.

We kept his suggestion with:
# You don't believe this is enough?
# For sure, it is!
cache_mem 256 MB
maximum_object_size_in_memory 24 KB
memory_replacement_policy heap GDSF
My 8.500.000 hits are coming mostly from this "small" cache segment in memory
16) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 43816)
Posted 10 Dec 2020 by Profile Yeti
Post:
Here are my experiences from switching to Squid:

Setting up Squid with the help of computezmle: easy

switching clients to use proxy: tricky

What has happened?

First all my clients, but one, where working fine und using squid. The one, that didn't really work, seemed to be okay, but all Atlas-WUs failed within 20 minutes. Finally I found that I had to setup the proxy-settings on the clients with it's full domain-name, not only the machine-name.

Okay, I wanted to be professionell and changed all other clients to use the full domain-name for the proxy. This was a bad idea, because now all formerly working fine clients couldn't upload the results anymore.

I had to flushdns-cache on all clients and since then all is working fine. Maybe, a reboot would have solved it also.

Perhaps this helps someone.

Oh, we checked what squid is doing for my clients; in the last 3 weeks it has served 8.500.000 http(s)-requests from it's RAM-Cache
17) Message boards : ATLAS application : How is Work-Distribution calculated ? (Message 43706)
Posted 25 Nov 2020 by Profile Yeti
Post:
Hi Yeti, nice to have you back :)
Yeah, feels good to be back again
There is a limitation on the server side for ATLAS and Theory to send out max 2 tasks per CPU. I have asked the admins to increase this to 4 for ATLAS. I would rather not remove the limits completely since many hosts will end up with tasks they will not be able to process before the deadline.

HM, 4 is better than two, but it is not really the optimize.

At the moment, "Max # of CPUs" is used for three things

    1* sets the number of cores, a WU should use, when there is no override by app_config
    2* gives the base for "Working Set Size"
    3* is taken to calculate, how much WUs a clients gets (multiplicated with 2 now, in short future with 4)



I'm shure, if you don't change number 3 to a more relaistic number, this shure will bring problems for smaller / older clients.

Examples:

A) Lando is an old 8-Core box, at the moment, gets 10 WUs, in future will have 20 WUs. Sorry, but way too much

B) Manni is my actual flagship an has 24 cores, gets 10 WUs, in future will have 20 WUs.

Couldn't you take the number of real cores into your calculation?

What about this or a similar formular: MaxWUs = Int( RealCores / "Max # CPU") * Faktor

Example-Calculations:

Manni (24 Core): Int( RealCores / "Max # CPU") * Faktor
Manni (24 Core): Int( 24 / 5) * 5 = 20

Lando ( 8 Core) : Int( RealCores / "Max # CPU") * Faktor
Lando ( 8 Core) : Int( 8 / 5) * 5 = 5

These results are much more realistic than with your old calculation!

The Faktor could be a fixed number ( 5 in example, perhaps 6) or could be taken from "Max # of jobs"

These leads us to much more realistic local Work-Balance

18) Message boards : ATLAS application : How is Work-Distribution calculated ? (Message 43679)
Posted 22 Nov 2020 by Profile Yeti
Post:
So you could experiment with number of CPUs in your preferences. This affects the amount of memory Boinc thinks each task is using (actual memory used can be set with app_config).
Nope, I can't play with the number of CPUs. If I raise this figure the Working-Set-Size of each Workunit will raise up to 10.200 MB. With 5-CPUs the Working-Set-Size is 7.500 MB.

The memory-setting in app_config is only responsable for the memory-setting of the Virtual-Machine.

The BOINC-Client reserves the memory that is set by Working-Set-Size, even if the Virtual-Machine needs less memory
19) Message boards : ATLAS application : How is Work-Distribution calculated ? (Message 43669)
Posted 22 Nov 2020 by Profile Yeti
Post:
Hi,

I'm a little bit irritated about Work-Distribution of Atlas-Work.

All my clients get 10 WUs. Regardless how powerfull or slow the individual workstation is.

So, my slowest PCs has enough work for up to 2 days.

My fastest PC has work for max 6 hours.

This are my LHC-Specific-preferences:



As long as boxes have 10 WUs local, the server tells "No Atlas work available". If they have less workunits, then we get exact the difference to 10.

What to do to get more work on my Power-Machines ?
20) Message boards : ATLAS application : Confused (Message 43665)
Posted 21 Nov 2020 by Profile Yeti
Post:
Just saw this thread.

The scheduler was designed in a time, where only Single-Core-WUs exist and with this it works very fine.

The scheduler has really problems to balance with Multi-Core-WUs; if you like to run these, it may be neccessary to help the scheduler. At LHC you have the possibility to tell "Give me only 1 Workunit". This is setup here: https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project

Choose 'Max # jobs' and set it to one or two or whatever you would like


Next 20


©2021 CERN