21) Message boards : Theory Application : Theory Task doing nothing (Message 42775)
Posted 2 Jun 2020 by CloverField
Post:
Ok got another one that was just stuck there with the same message.
This time it was not due to task switching.

Could it be due to the squid cache that I set up earlier?

Hopefully this will update to something more helpful then aborted by user.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=275990643
22) Message boards : Number crunching : How does task switching actually work? (Message 42765)
Posted 2 Jun 2020 by CloverField
Post:
The main issue with that for me is ATLAS loves to give me 8 core tasks. Which then kick 8 other jobs to the side, and usually break them.
What I want my tasks to do is say hey an atlas 8 core is ready. Let 8 more tasks finish and then slot the atlas in the free space. I could limit ATLAS to one core task,
but that kinds defeats the point of a threadrippper no?


This is a major flaw in the workings of the Boinc scheduler. You could try asking them to sort it, but they're a strange bunch. And you'll have to visit them in Github as they don't listen to anyone in the forums.

Although, why are your single core tasks breaking? The only problem I end up with is stuff not meeting deadlines. The Boinc scheduler is rubbish at that, it leaves things to the last minute, and if you happen to have the computer off or playing a game etc, you're a bit late sending them back.


It seems to be due to net work IO, only happens if the task switches in like the first ten minutes or however long it takes to configure itself, but when ATLAS forces the task swap, they are waiting to get something and when they come back on line they are still in that waiting state and just sit there forever.

In the case of theory they do something like this or they are completely unresponsive and you cant hit them at all through the vm console.

23) Message boards : Number crunching : How does task switching actually work? (Message 42761)
Posted 2 Jun 2020 by CloverField
Post:
The main issue with that for me is ATLAS loves to give me 8 core tasks. Which then kick 8 other jobs to the side, and usually break them.
What I want my tasks to do is say hey an atlas 8 core is ready. Let 8 more tasks finish and then slot the atlas in the free space. I could limit ATLAS to one core task,
but that kinds defeats the point of a threadrippper no?
24) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42738)
Posted 1 Jun 2020 by CloverField
Post:
Should a news post be made for the solution to this issue so everyone gets a notice in there BOINC client?
25) Message boards : Sixtrack Application : Internet access OK - project servers may be temporarily down. (Message 42712)
Posted 30 May 2020 by CloverField
Post:
The last 5 hours I have not been able to send any of the over 100 finished Sixtracks from here ( PDT)
Can't get new tasks either but since I planned ahead I still have 645 running.
Couldn't even send in one Theory to -dev and it can't be because of too many tasks at the same time.
Of course right now we have lots of Sixtracks running and supposed to be many more waiting right now.
I guess I will just watch and see if they finally let me update.


Are you on windows?
If so this is the actual issue.
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5441
26) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42708)
Posted 30 May 2020 by CloverField
Post:
Alot of people are about to find out about this the hard way
turns out alot of people were using this cert provider.

https://twitter.com/sleevi_/status/1266647545675210753
27) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42702)
Posted 30 May 2020 by CloverField
Post:
Seems to be fixed with the workaround on
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14006&postid=96882

LHC & Rosetta both seem to work. Other projects still work.


Can confirm that this works as well.

Hopefully the BOINC team will be able to get a new build out with the new certs as well before everything breaks.
28) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42690)
Posted 30 May 2020 by CloverField
Post:
Add NumberFields@home as another project affected.

Unfortunately, opening ca-bundle.crt in Windows only shows the details for the first of the 133 certificates in the bundle. I've been through them all, and - although a few of them have expired - none expired this morning.

Although the COMODO certificate authenticating this website, and the InCommon certificate authenticating the NumberFields and Rosetta websites, all seem to be in order, I've seen a suggestion on the web that certificates may be rejected as expired in some cases when a newer certificate is issued (even if the old one appears still to have time left to run before expiry).


Just noticed this in Opera browser on Windows 10:
This discussion is fine, but this thread: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5387
Which has images, specifically http://cms-results.web.cern.ch/cms-results/public-results/publications/SMP-15-003/CMS-SMP-15-003_Figure_006-a.png
Shows: https://www.dropbox.com/s/6qjbvllcsgslvrt/unsecure.jpg?dl=0


I can see the images just fine however I am getting a big not secure icon in the top left of chrome.
29) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42677)
Posted 30 May 2020 by CloverField
Post:
I've got the same date in there as well.
30) Message boards : Number crunching : How does task switching actually work? (Message 42671)
Posted 30 May 2020 by CloverField
Post:
Yeah I plan to build a atlas only box at some point in the future when I retire this comp.
That seems like the easiest way to fix the issue, as I'm not sure if the atlas team could adjust the deadlines.
I don't want to mess up their science just so I don't have to check on my comp once a day lol.
31) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42670)
Posted 30 May 2020 by CloverField
Post:
I also see the same, it could be the BOINC certificate expired?


But my other three projects (Universe, Milkyway, Einstein) are ok. Only Rosetta and LHC failed.

How do these certificates work? Explain like I'm five (T.M. Reddit)


Is basically a file with a cryptographic key in in that says hey you can trust me from xx/xx/xxxx to xx/xx/xxxx
if those dates go out of range you can no longer trust that connection and in this day and age most things reject that as insecure.

Edit:

Here is a much better non five year old explanation.
https://www.entrustdatacard.com/pages/ssl
32) Message boards : Number crunching : How does task switching actually work? (Message 42666)
Posted 30 May 2020 by CloverField
Post:
I don't like this task switching, it seems unnecessary. I changed "switch between applications" in Boinc to 100000 minutes, (i.e. never). Which means once you start something, finish it!

So I tried messing around with that as well and as far as I can tell that only applies when you are running multiple projects. Since I'm only running lhc at home it lets the tasks run to completion.
The actual "issue" seems to be running all the LHC@home projects at once. Since the atlas projects have such an earlier deadline then any off the other projects it likes to jump in the instant another task finishes, and since atlas tasks are multicore it will suspend the other jobs. The other virtual box projects really dont like this and it was causing me to have tons of errored/suck jobs that I would have to abort manually.
33) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 42664)
Posted 30 May 2020 by CloverField
Post:
I am also getting this.
I think LHC@home's webcerts might of expired.
:C
34) Message boards : Theory Application : Theory Task doing nothing (Message 42641)
Posted 28 May 2020 by CloverField
Post:
You have successfull Tasks for ATLAS, CMS and Theory in the last days.

When you let only sixtrack and ONE Task with VM (ATLAS, CMS or Theory) running and all other VM-Tasks suspended.
Is this Task running normal and finishing correct?

There are many sixtrack for the other 31 CPU's atm.


Yeah this would also work. I've just kinda been more focused on trying to do as much work as fast as possible lol.

Allowing each vm based task to run one instance and then filling the rest with six track would probably be the best way going forward.
That or if I build a new computer and dedicate it to ATLAS only as it seems to be the problem child with its quick deadline dates.
35) Message boards : Theory Application : Theory Task doing nothing (Message 42625)
Posted 26 May 2020 by CloverField
Post:
All the start stops
in that last theory task are actually from when boinc goes to fetch work. What usually happens there is it will get a bunch of atlas tasks back
and since those have a earlier due date it will stop whatever is currently running and switch back to atlas, this happens multiple times a day and this
end up killing my tasks. I think I might be able to fix this by setting the keep an additional x days work setting to 1 from .25 hopefully this keeps enough
of a buffer to prevent it from starting and stopping tasks all the time.
That's the hammer on the nail.
Since the last Theory update the fictive estimated runtime went from 100 hours to 10 days.
It would be the best solution if Laurence would fix this, but for the time being you may change it yourself by editing the Theory_2019_10_01.xml in LHC's project folder.
Change the job_duration value from 864000 into 360000.


I already made that change on your advice over in number crunching.
At the time I thought the issue was only limited to CMS tasks.

However it seem getting 1 day of work and then reducing the buffer to .25 days has fixed the issue as it effectively stops boinc from getting new ATLAS tasks.
I could probably get the same result with the no new work button.
36) Message boards : Theory Application : Theory Task doing nothing (Message 42623)
Posted 26 May 2020 by CloverField
Post:
It's usually a minor problem to run many tasks concurrently but it can become a problem if they change their status.
This happens if you start/restart your BOINC client or even at shutdown when lots of data has to be saved to disk.
Modern computers with lots of cores are more affected as they run more tasks concurrently.

Nobody can really tell what's the best combination on your computer. You'll have to try it out.


So this computer is my main server box all it does is LHC@home and every so often stream a movie to my tv.
As such its configured to run boinc 100% and does not suspend when the computer is in use. All the start stops
in that last theory task are actually from when boinc goes to fetch work. What usually happens there is it will get a bunch of atlas tasks back
and since those have a earlier due date it will stop whatever is currently running and switch back to atlas, this happens multiple times a day and this
end up killing my tasks. I think I might be able to fix this by setting the keep an additional x days work setting to 1 from .25 hopefully this keeps enough
of a buffer to prevent it from starting and stopping tasks all the time.
37) Message boards : Theory Application : Theory Task doing nothing (Message 42621)
Posted 26 May 2020 by CloverField
Post:
This should work right?

<app_config>
 <app>
  <name>Theory</name>
  <max_concurrent>28</max_concurrent>
 </app>
 <app>
  <name>ATLAS</name>
  <max_concurrent>2</max_concurrent>
 </app>
</app_config>
38) Message boards : Theory Application : Theory Task doing nothing (Message 42619)
Posted 26 May 2020 by CloverField
Post:
2020-05-26 08:08:11 (19788): Error in stop VM for VM: -108
Command:
VBoxManage -q controlvm "boinc_83115c7c7bfa4ba2" savestate
Output:
VBoxManage.exe: error: Machine 'boinc_83115c7c7bfa4ba2' is not currently running

Im betting this is the problem it looks like it got interrupted by a bunch of new atlas tasks starting up.
39) Message boards : Theory Application : Theory Task doing nothing (Message 42618)
Posted 26 May 2020 by CloverField
Post:
700 sixtrack and 5 with Error are shown. This is ok.
You have 64 GByte RAM and needed to control your PC when you mix Atlas and Theory.
Theory is not so difficult with the RAM as Atlas. You have 8 CPU for Atlas.
It is useful to control Atlas with a app_config.xml and less CPU's than 8 or not so many Atlas-Tasks in use,
because Atlas need a good control of the RAM.
Therefore is in the Atlas-folder of LHCathome a lot of help how to use it.



Got another one.

I'm pretty sure its not my ram either.
as I have more then enough.


Even when running all the atlas tasks I still have usually around 20 GB free.

Manged to get the task Id for this will. Will abort it and then check the error output.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=275253084
40) Message boards : Theory Application : Theory Task doing nothing (Message 42616)
Posted 26 May 2020 by CloverField
Post:
Ran only six track for the day everything was fine. Now switching back to all projects will report if this continues to be an issue with theory.


Previous 20 · Next 20


©2024 CERN