1) Message boards : ATLAS application : 2000 Events Threadripper 3995WX (Message 49619)
Posted 23 Feb 2024 by CloverField
Post:
Any reason why the tasks suddenly jumped up to 6 hours? They used be like 40 min to 2 hours in the past?
2) Message boards : News : Seasons greetings (Message 49065)
Posted 23 Dec 2023 by CloverField
Post:
Merry Christmas/ happy holidays everyone!
3) Message boards : Theory Application : Stuck WU: Waiting for the delivery of SIGUSR1 (Message 48118)
Posted 19 May 2023 by CloverField
Post:
Got about 4 of these last night.
4) Message boards : Theory Application : Stuck WU: Waiting for the delivery of SIGUSR1 (Message 47787)
Posted 25 Feb 2023 by CloverField
Post:
These continue to happen I get about ~5-10 a week. Is there anyway we could get some retry logic in the start up like at Altas and cms have so I don't have to make check for stuck tasks part of my morning routine?
5) Message boards : Theory Application : Stuck WU: Waiting for the delivery of SIGUSR1 (Message 47759)
Posted 8 Feb 2023 by CloverField
Post:
2023-02-07 15:05:56 (46556): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
.
.
.
2023-02-07 15:06:02 (46556): Setting network throttle for VM. (5120KB)

It looks like you tweak your network bandwidth settings for the VMs (or BOINC as a whole).
This makes no sense since it applies only to outgoing traffic (from the VM perspective), but it may affect the connection timing.

You may leave those settings unlimited or at default values.


Ill try and change those but I ended up putting them on because I would get latency spikes in my network when tasks would upload.
6) Message boards : Theory Application : Stuck WU: Waiting for the delivery of SIGUSR1 (Message 47757)
Posted 8 Feb 2023 by CloverField
Post:
Yup on the same pc
7) Message boards : Theory Application : Stuck WU: Waiting for the delivery of SIGUSR1 (Message 47755)
Posted 8 Feb 2023 by CloverField
Post:
Have checked one Theory-Task from you.
Seeing entries from Boinc-slot No. 29, 19 and 31 for the same Theory task.

So somehow boinc is starting the same task 3 times?
8) Message boards : Theory Application : Stuck WU: Waiting for the delivery of SIGUSR1 (Message 47752)
Posted 7 Feb 2023 by CloverField
Post:
Just got some more this time 4 theorys started at the same time. Could it be squid caching the response for that endpoint and it causes the theorys to hang?
9) Message boards : Theory Application : Stuck WU: Waiting for the delivery of SIGUSR1 (Message 47749)
Posted 5 Feb 2023 by CloverField
Post:
I woke up to 4 more stuck this morning. The only thing I can see in common with all of them is that it looks like they all started at roughly the same time. ~10-15 spread. Could this be a network issue because to many requests are going to this sigusr1 thing at a time?
10) Message boards : Theory Application : Stuck WU: Waiting for the delivery of SIGUSR1 (Message 47748)
Posted 4 Feb 2023 by CloverField
Post:
I know this is Necroposting, but I work up this morning to 4 theory tasks that had been stuck for 8 hours with this error. Is there anything that can be done to prevent this? I manually aborted them and the next set of task started with no issues. I am running squid as well. But it still happens with or without the proxy.
11) Message boards : ATLAS application : ATLAS vbox v2.03 (Message 47575)
Posted 5 Dec 2022 by CloverField
Post:
These have been running smoothly for me. The improved start up time is really nice in combination with squid.
12) Message boards : ATLAS application : Bad WUs? (Message 45857)
Posted 13 Dec 2021 by CloverField
Post:
26202 is a problem wrapper per the link https://boinc.berkeley.edu/trac/wiki/VboxApps#Premadevboxwrapperexecutables as it uses the COM interface.
26203 is reporting in the logs as '26202' and really should be reporting its own version number.

The bad WUs in this forum thread took care of themselves. It looked to be more an error of CVMFS within the VM hanging and the processing never starting.


So are we good to restart ATLAS multicore again? Or will we need to get this new vbox wrapper to get ATLAS working?
13) Message boards : ATLAS application : Atlas virtual box requirements (Message 45328)
Posted 11 Sep 2021 by CloverField
Post:
So I did some testing and was able to get versions 6.1.12 and 5.2.44 working correctly with no more computation failures from atlas.
I ended up sticking with 6.1.12 because that is the version that comes with the latest version of boinc.
Thanks for all the help everyone.
14) Message boards : ATLAS application : Atlas virtual box requirements (Message 45314)
Posted 8 Sep 2021 by CloverField
Post:
No I do not it has been off since this computer was built.
Unless update virtual box some how turned it on?
15) Message boards : ATLAS application : Atlas virtual box requirements (Message 45312)
Posted 8 Sep 2021 by CloverField
Post:
Since you are on Win10, VirtualBox 5.2.44 will be better. I don't think it makes much difference on Linux.
https://www.virtualbox.org/wiki/Download_Old_Builds_5_2

Thanks for the info Ill downgrade my vbox version. Any reason for that particular version?
16) Message boards : ATLAS application : Atlas virtual box requirements (Message 45310)
Posted 8 Sep 2021 by CloverField
Post:
So I just updated virtual box to 6.1.26
and I've been getting alot of computation failures is there a recommended virtual box version for atlas?
Looking at the failed tasks I noticed they seem to be related to the guest addons package I also just updated that to the latest version is there a recommended version for that as well?
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
2021-09-08 08:42:14 (4032): Detected: vboxwrapper 26197
2021-09-08 08:42:14 (4032): Detected: BOINC client v7.7
2021-09-08 08:42:21 (4032): Error in guest additions for VM: -2147024891
Command:
VBoxManage -q list systemproperties
Output:
VBoxManage.exe: error: Failed to create the VirtualBox object!
VBoxManage.exe: error: The object is not ready
VBoxManage.exe: error: Details: code E_ACCESSDENIED (0x80070005), component VirtualBoxClientWrap, interface IVirtualBoxClient

2021-09-08 08:42:21 (4032): Detected: VirtualBox VboxManage Interface (Version: 6.1.26)
2021-09-08 08:42:27 (4032): Error in host info for VM: -2147024891
Command:
VBoxManage -q list hostinfo 
Output:
VBoxManage.exe: error: Failed to create the VirtualBox object!
VBoxManage.exe: error: The object is not ready
VBoxManage.exe: error: Details: code E_ACCESSDENIED (0x80070005), component VirtualBoxClientWrap, interface IVirtualBoxClient

2021-09-08 08:42:27 (4032): WARNING: Communication with VM Hypervisor failed.
2021-09-08 08:42:27 (4032): ERROR: VBoxManage list hostinfo failed
08:42:27 (4032): called boinc_finish(1)

</stderr_txt>
]]>
17) Message boards : Theory Application : Theory Task doing nothing (Message 42930)
Posted 29 Jun 2020 by CloverField
Post:
Found the source of the issue looks like a squid pemissions issue. Lots of logs in the file saying permission denied. Just need to wait for some cms tasks to finish and then Ill redo my squid cache.
18) Message boards : ATLAS application : Squid proxies may need restart (Message 42786)
Posted 2 Jun 2020 by CloverField
Post:
This is also in regards to your post in the Theory thread:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5431&postid=42775


You may first check your access.log and cache.log.
Do you notice error messages that correspond to your issues?

If no, squid is most likely running fine and the issues are caused by something else.

If yes, you should clear the cache and restart fresh.

You may also insert the following line in your squid.conf and do a "squid -k reconfigure".
shutdown_lifetime 3 seconds

This avoids the 60 seconds default delay when you shutdown/restart squid but I'm not 100% sure if changing this timeout requires a squid -k restart. At least Squid will be prepared for the next restart.


The logs look good to me. And post the squid restart everything seems to be fine.
I just got some atlas tasks though so I assume they will kill at least one theory.
If I get another stuck one. Ill nuke the cache and also do a project reset to see if that solves the issue.
19) Message boards : Theory Application : Theory Task doing nothing (Message 42781)
Posted 2 Jun 2020 by CloverField
Post:
Ok got another one that was just stuck there with the same message.
This time it was not due to task switching.

Could it be due to the squid cache that I set up earlier?

Hopefully this will update to something more helpful then aborted by user.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=275990643


Just restarted squid for ATLAS, I'll see if this fixes the theory issues as well.
20) Message boards : ATLAS application : Squid proxies may need restart (Message 42777)
Posted 2 Jun 2020 by CloverField
Post:
Hi all,

This message is only relevant if you run your own squid proxy server for ATLAS tasks.

After the CERN database outage last week, a problem was seen with the cached information on squid proxy servers all over the ATLAS Grid which can cause tasks to fail. The solution to the problem is to restart the squid service, so if you are running your own squid please restart it in order to avoid potential problems.

The ATLAS-managed squid servers which tasks use by default were restarted earlier today, so if you saw strange failures in tasks between Thursday last week and now this might have been the reason.


By restart do you just mean squid -k restart
or deleting the cache and starting fresh?


Next 20


©2024 CERN