21) Message boards : ATLAS application : Atlas Native Transient HTTP Errors Uploading Resultfile (Message 47006)
Posted 12 Jul 2022 by Profile Yeti
Post:
Your Theory task reported a success.
Yes, but it had only run 3 - 5 Minutes and I was not shure, if this is okay.


Which Squid did you use?
4.14


The one you used in the past or a new one (on the Linux VM)?
The "old" one, I'm running since you helped me setting it up years ago
22) Message boards : ATLAS application : Atlas Native Transient HTTP Errors Uploading Resultfile (Message 47004)
Posted 12 Jul 2022 by Profile Yeti
Post:
Hi,
after quit some time of abstinenz form LHC I now tried to run Atlas Native inside a VirtualBox with Ubuntu 20.04.

All seemed to do fine, I could see the Athena-Tasks running, the WU has finished and tried to upload the results.

All files but one where uploaded succesfull:

Manni VL_CI01_31501

Started upload of tLNLDm26CV1nfZGDcpSWOuwoABFKDmABFKDmTzMbDmF51LDm3rzjam_0_r1453389463_ATLAS_result
Started upload of tuXMDmju8U1nfZGDcpSWOuwoABFKDmABFKDmTzMbDmxm1LDmNZfCSo_0_r1401751060_ATLAS_result
Project communication failed: attempting access to reference site
Temporarily failed upload of tLNLDm26CV1nfZGDcpSWOuwoABFKDmABFKDmTzMbDmF51LDm3rzjam_0_r1453389463_ATLAS_result: transient HTTP error
Backing off 03:38:09 on upload of tLNLDm26CV1nfZGDcpSWOuwoABFKDmABFKDmTzMbDmF51LDm3rzjam_0_r1453389463_ATLAS_result
Internet access OK - project servers may be temporarily down.
Temporarily failed upload of tuXMDmju8U1nfZGDcpSWOuwoABFKDmABFKDmTzMbDmxm1LDmNZfCSo_0_r1401751060_ATLAS_result: transient HTTP error
Backing off 03:58:24 on upload of tuXMDmju8U1nfZGDcpSWOuwoABFKDmABFKDmTzMbDmxm1LDmNZfCSo_0_r1401751060_ATLAS_result

So, who has a problem? The Upload-Server, my local Squid or my firewall, letting something not through ?
23) Message boards : Number crunching : Forum: What has happened to Function [List=1] (Message 46919)
Posted 22 Jun 2022 by Profile Yeti
Post:
Hello Admin(s),

it seems, as if the forum-feature "list=1" seems damaged.

The help-functions tells:



But instead it shows:

  1. Element 1
  2. Element 2



Please check and correct it, my checklist looks terrible without this feature

Yeti

24) Message boards : ATLAS application : Bad WUs? (Message 45820)
Posted 8 Dec 2021 by Profile Yeti
Post:
maeax wrote:
PC with one CPU (Virtualbox 6.1.12) have no problems so long.
All with faulty are using 2 CPU's (Virtualbox 6.1.30).
so the question seems to be: is the problem connected to the VBox version or to the number of CPUs used ???

For me it happens on VBOX 6.1.16 AND 6.1.30, they ran fine formerly for days (6.1.30) or month (6.1.16)

And I used the same number of cores in the past and the same number of simultan running WUs
25) Message boards : ATLAS application : Bad WUs? (Message 45814)
Posted 8 Dec 2021 by Profile Yeti
Post:
I haven't seen it yet on native ATLAS.
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10697859&offset=0&show_names=0&state=4&appid=

It seems, as it damages the VirtualBox.

I have seen two different problems:

A) VMs running endless with less than 1% CPU-Usage
B) VMs get suspended after 10/20/30/40 Seconds, they are "unmanagable". This spreads over all my systems and different VirtualBox-Versions.

Today until now I had to abort 56 tasks
26) Message boards : ATLAS application : Bad WUs? (Message 45811)
Posted 8 Dec 2021 by Profile Yeti
Post:
Okay, for me time to take an outage from Atlas.
27) Message boards : ATLAS application : Bad WUs? (Message 45810)
Posted 8 Dec 2021 by Profile Yeti
Post:
This Morning I had to to cancel more than 15 WUs hanging around.

Sorry, but it sucks !
28) Message boards : ATLAS application : Bad WUs? (Message 45808)
Posted 7 Dec 2021 by Profile Yeti
Post:
just now, I got the next one.
So this was the fourth one since last night. I am afraid more will follow :-(

I'm shure more will follow !
29) Message boards : ATLAS application : Bad WUs? (Message 45806)
Posted 7 Dec 2021 by Profile Yeti
Post:
The Rate of this failure has raised since yesterday to more than 10 for me :-(
30) Message boards : ATLAS application : ATLAS long simulation 1.00 (Message 45058)
Posted 16 Jun 2021 by Profile Yeti
Post:
Hi all,

I have paused the submission of long tasks for the moment, since there are very little hosts running them and the large cluster running them previously is no longer running BOINC. But we may bring the long tasks back in the future if there is demand for them. Thanks to everyone who helped testing and running these tasks.

David

David,

if you bring them back, please make them available for users running normal Windows/Linux clients.

Thanks

Yeti
31) Message boards : Number crunching : VM Applications Errors (Message 44823)
Posted 26 Apr 2021 by Profile Yeti
Post:
LHC@Home is not a plug and play project like other BOINC-Projects are.

You can easily run LHC@Home like a plug and play project: if you run Sixtrack only
You can easily run LHC@Home like a plug and play project: if you run one of Atlas / Theory / CMS exclusiv and if you keep this setting: "Use at most 100 % of CPU time" (VMs don't like this kind of throttling)

If you want to run all kind of applications LHC@Home offers, you will have to make mikro-managing with your client; BOINC will not be able to always give you what you want for your client.
32) Message boards : Number crunching : GPU advertised for LHC, but they don't do it? (Message 44678)
Posted 8 Apr 2021 by Profile Yeti
Post:
I'm crunching these wus with my little Rx550 in 20/25 minutes.

My RTX2080 finishes them in less than 2 minutes, even more than one at the same time
33) Message boards : Number crunching : VM Applications Errors (Message 44430)
Posted 3 Mar 2021 by Profile Yeti
Post:
Thanks for your reply, that's a very large post of yours you linked, is their a particular part I can get away with reading? I don't want to spend hours on this.
HM, If you really want to to crunch Atlas, Theory or CMS, you really need to go through the the list point by point as I already mentioned there:

Please, check this list and be sure to check really all Details, step by step, all are important.
...
34) Message boards : Number crunching : VM Applications Errors (Message 44428)
Posted 2 Mar 2021 by Profile Yeti
Post:
Perhaps you find informations in this checklist
35) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 43922)
Posted 15 Dec 2020 by Profile Yeti
Post:
TCP_Tunnel is because of https - connection
[quote]squid TCP_Tunnel
xx.yyy.xxx.yyy 3128 - - [13/Dec/2020:18:29:33 +0100] "CONNECT lhcathome.cern.ch:443 HTTP/1.1" 200 58982 "-" "BOINC client (x86_64-pc-linux-gnu 7.16.6)" TCP_TUNNEL:HIER_DIRECT

WCG show the same info:
xx.yyy.zzz.xxx 3128 - - [15/Dec/2020:11:56:27 +0100] "CONNECT www.worldcommunitygrid.org:443 HTTP/1.1" 200 5980 "-" "BOINC client (x86_64-pc-linux-gnu 7.16.6)" TCP_TUNNEL:HIER_DIRECT
36) Message boards : Number crunching : Running Benchmark (Message 43905)
Posted 14 Dec 2020 by Profile Yeti
Post:
@Magic:

Go to your favorite project (Einstein, you wrote further on) and change anything, you even can change something with the same value. Important for you is, that really say save afterwards.

This will set this project being responsable for your settings, then lat all your clients update this projekt and then all should be again as you like.

@All:

If I understoord TACC right, it is not a normal BOINC-Projekt, but more like a BOINC-Manager. They want to give users the chance to set up BOINC and TACC and then forget it. TACC will coordinate, what th clients will crunch and more.
37) Message boards : Number crunching : Not getting any tasks, though many are available (Message 43856)
Posted 11 Dec 2020 by Profile Yeti
Post:
Shut down the PC so that it switches off.

Stay off for 1 minute !

Switch on and try again.

If this doesn't help, I would uninstall complete BOINC, remove all BOINC-folders, afterwards reboot the machine and then try a clean install
38) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 43820)
Posted 10 Dec 2020 by Profile Yeti
Post:
I could cache something like 64GB's worth of things in ram, I was thinking to get some Optane DIMMS, then I could take my host upto 640GB of memory.
Toby,

sorry, this makes really no sense. I was on the same trip as you and computezmle told me, not to do so.

We kept his suggestion with:
# You don't believe this is enough?
# For sure, it is!
cache_mem 256 MB
maximum_object_size_in_memory 24 KB
memory_replacement_policy heap GDSF
My 8.500.000 hits are coming mostly from this "small" cache segment in memory
39) Message boards : Number crunching : Setting up a local Squid to work with LHC@home - Comments and Questions (Message 43816)
Posted 10 Dec 2020 by Profile Yeti
Post:
Here are my experiences from switching to Squid:

Setting up Squid with the help of computezmle: easy

switching clients to use proxy: tricky

What has happened?

First all my clients, but one, where working fine und using squid. The one, that didn't really work, seemed to be okay, but all Atlas-WUs failed within 20 minutes. Finally I found that I had to setup the proxy-settings on the clients with it's full domain-name, not only the machine-name.

Okay, I wanted to be professionell and changed all other clients to use the full domain-name for the proxy. This was a bad idea, because now all formerly working fine clients couldn't upload the results anymore.

I had to flushdns-cache on all clients and since then all is working fine. Maybe, a reboot would have solved it also.

Perhaps this helps someone.

Oh, we checked what squid is doing for my clients; in the last 3 weeks it has served 8.500.000 http(s)-requests from it's RAM-Cache
40) Message boards : ATLAS application : How is Work-Distribution calculated ? (Message 43706)
Posted 25 Nov 2020 by Profile Yeti
Post:
Hi Yeti, nice to have you back :)
Yeah, feels good to be back again
There is a limitation on the server side for ATLAS and Theory to send out max 2 tasks per CPU. I have asked the admins to increase this to 4 for ATLAS. I would rather not remove the limits completely since many hosts will end up with tasks they will not be able to process before the deadline.

HM, 4 is better than two, but it is not really the optimize.

At the moment, "Max # of CPUs" is used for three things

    1* sets the number of cores, a WU should use, when there is no override by app_config
    2* gives the base for "Working Set Size"
    3* is taken to calculate, how much WUs a clients gets (multiplicated with 2 now, in short future with 4)



I'm shure, if you don't change number 3 to a more relaistic number, this shure will bring problems for smaller / older clients.

Examples:

A) Lando is an old 8-Core box, at the moment, gets 10 WUs, in future will have 20 WUs. Sorry, but way too much

B) Manni is my actual flagship an has 24 cores, gets 10 WUs, in future will have 20 WUs.

Couldn't you take the number of real cores into your calculation?

What about this or a similar formular: MaxWUs = Int( RealCores / "Max # CPU") * Faktor

Example-Calculations:

Manni (24 Core): Int( RealCores / "Max # CPU") * Faktor
Manni (24 Core): Int( 24 / 5) * 5 = 20

Lando ( 8 Core) : Int( RealCores / "Max # CPU") * Faktor
Lando ( 8 Core) : Int( 8 / 5) * 5 = 5

These results are much more realistic than with your old calculation!

The Faktor could be a fixed number ( 5 in example, perhaps 6) or could be taken from "Max # of jobs"

These leads us to much more realistic local Work-Balance



Previous 20 · Next 20


©2024 CERN