1) Message boards : ATLAS application : Bad WUs? (Message 45857)
Posted 13 Dec 2021 by CloverField
Post:
26202 is a problem wrapper per the link https://boinc.berkeley.edu/trac/wiki/VboxApps#Premadevboxwrapperexecutables as it uses the COM interface.
26203 is reporting in the logs as '26202' and really should be reporting its own version number.

The bad WUs in this forum thread took care of themselves. It looked to be more an error of CVMFS within the VM hanging and the processing never starting.


So are we good to restart ATLAS multicore again? Or will we need to get this new vbox wrapper to get ATLAS working?
2) Message boards : ATLAS application : Atlas virtual box requirements (Message 45328)
Posted 11 Sep 2021 by CloverField
Post:
So I did some testing and was able to get versions 6.1.12 and 5.2.44 working correctly with no more computation failures from atlas.
I ended up sticking with 6.1.12 because that is the version that comes with the latest version of boinc.
Thanks for all the help everyone.
3) Message boards : ATLAS application : Atlas virtual box requirements (Message 45314)
Posted 8 Sep 2021 by CloverField
Post:
No I do not it has been off since this computer was built.
Unless update virtual box some how turned it on?
4) Message boards : ATLAS application : Atlas virtual box requirements (Message 45312)
Posted 8 Sep 2021 by CloverField
Post:
Since you are on Win10, VirtualBox 5.2.44 will be better. I don't think it makes much difference on Linux.
https://www.virtualbox.org/wiki/Download_Old_Builds_5_2

Thanks for the info Ill downgrade my vbox version. Any reason for that particular version?
5) Message boards : ATLAS application : Atlas virtual box requirements (Message 45310)
Posted 8 Sep 2021 by CloverField
Post:
So I just updated virtual box to 6.1.26
and I've been getting alot of computation failures is there a recommended virtual box version for atlas?
Looking at the failed tasks I noticed they seem to be related to the guest addons package I also just updated that to the latest version is there a recommended version for that as well?
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
2021-09-08 08:42:14 (4032): Detected: vboxwrapper 26197
2021-09-08 08:42:14 (4032): Detected: BOINC client v7.7
2021-09-08 08:42:21 (4032): Error in guest additions for VM: -2147024891
Command:
VBoxManage -q list systemproperties
Output:
VBoxManage.exe: error: Failed to create the VirtualBox object!
VBoxManage.exe: error: The object is not ready
VBoxManage.exe: error: Details: code E_ACCESSDENIED (0x80070005), component VirtualBoxClientWrap, interface IVirtualBoxClient

2021-09-08 08:42:21 (4032): Detected: VirtualBox VboxManage Interface (Version: 6.1.26)
2021-09-08 08:42:27 (4032): Error in host info for VM: -2147024891
Command:
VBoxManage -q list hostinfo 
Output:
VBoxManage.exe: error: Failed to create the VirtualBox object!
VBoxManage.exe: error: The object is not ready
VBoxManage.exe: error: Details: code E_ACCESSDENIED (0x80070005), component VirtualBoxClientWrap, interface IVirtualBoxClient

2021-09-08 08:42:27 (4032): WARNING: Communication with VM Hypervisor failed.
2021-09-08 08:42:27 (4032): ERROR: VBoxManage list hostinfo failed
08:42:27 (4032): called boinc_finish(1)

</stderr_txt>
]]>
6) Message boards : Theory Application : Theory Task doing nothing (Message 42930)
Posted 29 Jun 2020 by CloverField
Post:
Found the source of the issue looks like a squid pemissions issue. Lots of logs in the file saying permission denied. Just need to wait for some cms tasks to finish and then Ill redo my squid cache.
7) Message boards : ATLAS application : Squid proxies may need restart (Message 42786)
Posted 2 Jun 2020 by CloverField
Post:
This is also in regards to your post in the Theory thread:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5431&postid=42775


You may first check your access.log and cache.log.
Do you notice error messages that correspond to your issues?

If no, squid is most likely running fine and the issues are caused by something else.

If yes, you should clear the cache and restart fresh.

You may also insert the following line in your squid.conf and do a "squid -k reconfigure".
shutdown_lifetime 3 seconds

This avoids the 60 seconds default delay when you shutdown/restart squid but I'm not 100% sure if changing this timeout requires a squid -k restart. At least Squid will be prepared for the next restart.


The logs look good to me. And post the squid restart everything seems to be fine.
I just got some atlas tasks though so I assume they will kill at least one theory.
If I get another stuck one. Ill nuke the cache and also do a project reset to see if that solves the issue.
8) Message boards : Theory Application : Theory Task doing nothing (Message 42781)
Posted 2 Jun 2020 by CloverField
Post:
Ok got another one that was just stuck there with the same message.
This time it was not due to task switching.

Could it be due to the squid cache that I set up earlier?

Hopefully this will update to something more helpful then aborted by user.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=275990643


Just restarted squid for ATLAS, I'll see if this fixes the theory issues as well.
9) Message boards : ATLAS application : Squid proxies may need restart (Message 42777)
Posted 2 Jun 2020 by CloverField
Post:
Hi all,

This message is only relevant if you run your own squid proxy server for ATLAS tasks.

After the CERN database outage last week, a problem was seen with the cached information on squid proxy servers all over the ATLAS Grid which can cause tasks to fail. The solution to the problem is to restart the squid service, so if you are running your own squid please restart it in order to avoid potential problems.

The ATLAS-managed squid servers which tasks use by default were restarted earlier today, so if you saw strange failures in tasks between Thursday last week and now this might have been the reason.


By restart do you just mean squid -k restart
or deleting the cache and starting fresh?
10) Message boards : Theory Application : Theory Task doing nothing (Message 42775)
Posted 2 Jun 2020 by CloverField
Post:
Ok got another one that was just stuck there with the same message.
This time it was not due to task switching.

Could it be due to the squid cache that I set up earlier?

Hopefully this will update to something more helpful then aborted by user.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=275990643
11) Message boards : Number crunching : How does task switching actually work? (Message 42765)
Posted 2 Jun 2020 by CloverField
Post:
The main issue with that for me is ATLAS loves to give me 8 core tasks. Which then kick 8 other jobs to the side, and usually break them.
What I want my tasks to do is say hey an atlas 8 core is ready. Let 8 more tasks finish and then slot the atlas in the free space. I could limit ATLAS to one core task,
but that kinds defeats the point of a threadrippper no?


This is a major flaw in the workings of the Boinc scheduler. You could try asking them to sort it, but they're a strange bunch. And you'll have to visit them in Github as they don't listen to anyone in the forums.

Although, why are your single core tasks breaking? The only problem I end up with is stuff not meeting deadlines. The Boinc scheduler is rubbish at that, it leaves things to the last minute, and if you happen to have the computer off or playing a game etc, you're a bit late sending them back.


It seems to be due to net work IO, only happens if the task switches in like the first ten minutes or however long it takes to configure itself, but when ATLAS forces the task swap, they are waiting to get something and when they come back on line they are still in that waiting state and just sit there forever.

In the case of theory they do something like this or they are completely unresponsive and you cant hit them at all through the vm console.

12) Message boards : Number crunching : How does task switching actually work? (Message 42761)
Posted 2 Jun 2020 by CloverField
Post:
The main issue with that for me is ATLAS loves to give me 8 core tasks. Which then kick 8 other jobs to the side, and usually break them.
What I want my tasks to do is say hey an atlas 8 core is ready. Let 8 more tasks finish and then slot the atlas in the free space. I could limit ATLAS to one core task,
but that kinds defeats the point of a threadrippper no?
13) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42738)
Posted 1 Jun 2020 by CloverField
Post:
Should a news post be made for the solution to this issue so everyone gets a notice in there BOINC client?
14) Message boards : Sixtrack Application : Internet access OK - project servers may be temporarily down. (Message 42712)
Posted 30 May 2020 by CloverField
Post:
The last 5 hours I have not been able to send any of the over 100 finished Sixtracks from here ( PDT)
Can't get new tasks either but since I planned ahead I still have 645 running.
Couldn't even send in one Theory to -dev and it can't be because of too many tasks at the same time.
Of course right now we have lots of Sixtracks running and supposed to be many more waiting right now.
I guess I will just watch and see if they finally let me update.


Are you on windows?
If so this is the actual issue.
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5441
15) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42708)
Posted 30 May 2020 by CloverField
Post:
Alot of people are about to find out about this the hard way
turns out alot of people were using this cert provider.

https://twitter.com/sleevi_/status/1266647545675210753
16) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42702)
Posted 30 May 2020 by CloverField
Post:
Seems to be fixed with the workaround on
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14006&postid=96882

LHC & Rosetta both seem to work. Other projects still work.


Can confirm that this works as well.

Hopefully the BOINC team will be able to get a new build out with the new certs as well before everything breaks.
17) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42690)
Posted 30 May 2020 by CloverField
Post:
Add NumberFields@home as another project affected.

Unfortunately, opening ca-bundle.crt in Windows only shows the details for the first of the 133 certificates in the bundle. I've been through them all, and - although a few of them have expired - none expired this morning.

Although the COMODO certificate authenticating this website, and the InCommon certificate authenticating the NumberFields and Rosetta websites, all seem to be in order, I've seen a suggestion on the web that certificates may be rejected as expired in some cases when a newer certificate is issued (even if the old one appears still to have time left to run before expiry).


Just noticed this in Opera browser on Windows 10:
This discussion is fine, but this thread: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5387
Which has images, specifically http://cms-results.web.cern.ch/cms-results/public-results/publications/SMP-15-003/CMS-SMP-15-003_Figure_006-a.png
Shows: https://www.dropbox.com/s/6qjbvllcsgslvrt/unsecure.jpg?dl=0


I can see the images just fine however I am getting a big not secure icon in the top left of chrome.
18) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42677)
Posted 30 May 2020 by CloverField
Post:
I've got the same date in there as well.
19) Message boards : Number crunching : How does task switching actually work? (Message 42671)
Posted 30 May 2020 by CloverField
Post:
Yeah I plan to build a atlas only box at some point in the future when I retire this comp.
That seems like the easiest way to fix the issue, as I'm not sure if the atlas team could adjust the deadlines.
I don't want to mess up their science just so I don't have to check on my comp once a day lol.
20) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42670)
Posted 30 May 2020 by CloverField
Post:
I also see the same, it could be the BOINC certificate expired?


But my other three projects (Universe, Milkyway, Einstein) are ok. Only Rosetta and LHC failed.

How do these certificates work? Explain like I'm five (T.M. Reddit)


Is basically a file with a cryptographic key in in that says hey you can trust me from xx/xx/xxxx to xx/xx/xxxx
if those dates go out of range you can no longer trust that connection and in this day and age most things reject that as insecure.

Edit:

Here is a much better non five year old explanation.
https://www.entrustdatacard.com/pages/ssl


Next 20


©2022 CERN