21)
Message boards :
Theory Application :
Theory Task doing nothing
(Message 42775)
Posted 2 Jun 2020 by CloverField Post: Ok got another one that was just stuck there with the same message. This time it was not due to task switching. Could it be due to the squid cache that I set up earlier? Hopefully this will update to something more helpful then aborted by user. https://lhcathome.cern.ch/lhcathome/result.php?resultid=275990643 |
22)
Message boards :
Number crunching :
How does task switching actually work?
(Message 42765)
Posted 2 Jun 2020 by CloverField Post: The main issue with that for me is ATLAS loves to give me 8 core tasks. Which then kick 8 other jobs to the side, and usually break them. It seems to be due to net work IO, only happens if the task switches in like the first ten minutes or however long it takes to configure itself, but when ATLAS forces the task swap, they are waiting to get something and when they come back on line they are still in that waiting state and just sit there forever. In the case of theory they do something like this or they are completely unresponsive and you cant hit them at all through the vm console. |
23)
Message boards :
Number crunching :
How does task switching actually work?
(Message 42761)
Posted 2 Jun 2020 by CloverField Post: The main issue with that for me is ATLAS loves to give me 8 core tasks. Which then kick 8 other jobs to the side, and usually break them. What I want my tasks to do is say hey an atlas 8 core is ready. Let 8 more tasks finish and then slot the atlas in the free space. I could limit ATLAS to one core task, but that kinds defeats the point of a threadrippper no? |
24)
Message boards :
Number crunching :
Peer certificate cannot be authenticated with given CA certificates
(Message 42738)
Posted 1 Jun 2020 by CloverField Post: Should a news post be made for the solution to this issue so everyone gets a notice in there BOINC client? |
25)
Message boards :
Sixtrack Application :
Internet access OK - project servers may be temporarily down.
(Message 42712)
Posted 30 May 2020 by CloverField Post: The last 5 hours I have not been able to send any of the over 100 finished Sixtracks from here ( PDT) Are you on windows? If so this is the actual issue. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5441 |
26)
Message boards :
Number crunching :
Peer certificate cannot be authenticated with given CA certificates
(Message 42708)
Posted 30 May 2020 by CloverField Post: Alot of people are about to find out about this the hard way turns out alot of people were using this cert provider. https://twitter.com/sleevi_/status/1266647545675210753 |
27)
Message boards :
Number crunching :
Peer certificate cannot be authenticated with given CA certificates
(Message 42702)
Posted 30 May 2020 by CloverField Post: Seems to be fixed with the workaround on Can confirm that this works as well. Hopefully the BOINC team will be able to get a new build out with the new certs as well before everything breaks. |
28)
Message boards :
Number crunching :
Peer certificate cannot be authenticated with given CA certificates
(Message 42690)
Posted 30 May 2020 by CloverField Post: Add NumberFields@home as another project affected. I can see the images just fine however I am getting a big not secure icon in the top left of chrome. |
29)
Message boards :
Number crunching :
Peer certificate cannot be authenticated with given CA certificates
(Message 42677)
Posted 30 May 2020 by CloverField Post: I've got the same date in there as well. |
30)
Message boards :
Number crunching :
How does task switching actually work?
(Message 42671)
Posted 30 May 2020 by CloverField Post: Yeah I plan to build a atlas only box at some point in the future when I retire this comp. That seems like the easiest way to fix the issue, as I'm not sure if the atlas team could adjust the deadlines. I don't want to mess up their science just so I don't have to check on my comp once a day lol. |
31)
Message boards :
Number crunching :
Peer certificate cannot be authenticated with given CA certificates
(Message 42670)
Posted 30 May 2020 by CloverField Post: I also see the same, it could be the BOINC certificate expired? Is basically a file with a cryptographic key in in that says hey you can trust me from xx/xx/xxxx to xx/xx/xxxx if those dates go out of range you can no longer trust that connection and in this day and age most things reject that as insecure. Edit: Here is a much better non five year old explanation. https://www.entrustdatacard.com/pages/ssl |
32)
Message boards :
Number crunching :
How does task switching actually work?
(Message 42666)
Posted 30 May 2020 by CloverField Post: I don't like this task switching, it seems unnecessary. I changed "switch between applications" in Boinc to 100000 minutes, (i.e. never). Which means once you start something, finish it! So I tried messing around with that as well and as far as I can tell that only applies when you are running multiple projects. Since I'm only running lhc at home it lets the tasks run to completion. The actual "issue" seems to be running all the LHC@home projects at once. Since the atlas projects have such an earlier deadline then any off the other projects it likes to jump in the instant another task finishes, and since atlas tasks are multicore it will suspend the other jobs. The other virtual box projects really dont like this and it was causing me to have tons of errored/suck jobs that I would have to abort manually. |
33)
Message boards :
Number crunching :
Peer certificate cannot be authenticated with given CA certificates
(Message 42664)
Posted 30 May 2020 by CloverField Post: I am also getting this. I think LHC@home's webcerts might of expired. :C |
34)
Message boards :
Theory Application :
Theory Task doing nothing
(Message 42641)
Posted 28 May 2020 by CloverField Post: You have successfull Tasks for ATLAS, CMS and Theory in the last days. Yeah this would also work. I've just kinda been more focused on trying to do as much work as fast as possible lol. Allowing each vm based task to run one instance and then filling the rest with six track would probably be the best way going forward. That or if I build a new computer and dedicate it to ATLAS only as it seems to be the problem child with its quick deadline dates. |
35)
Message boards :
Theory Application :
Theory Task doing nothing
(Message 42625)
Posted 26 May 2020 by CloverField Post: All the start stopsThat's the hammer on the nail. I already made that change on your advice over in number crunching. At the time I thought the issue was only limited to CMS tasks. However it seem getting 1 day of work and then reducing the buffer to .25 days has fixed the issue as it effectively stops boinc from getting new ATLAS tasks. I could probably get the same result with the no new work button. |
36)
Message boards :
Theory Application :
Theory Task doing nothing
(Message 42623)
Posted 26 May 2020 by CloverField Post: It's usually a minor problem to run many tasks concurrently but it can become a problem if they change their status. So this computer is my main server box all it does is LHC@home and every so often stream a movie to my tv. As such its configured to run boinc 100% and does not suspend when the computer is in use. All the start stops in that last theory task are actually from when boinc goes to fetch work. What usually happens there is it will get a bunch of atlas tasks back and since those have a earlier due date it will stop whatever is currently running and switch back to atlas, this happens multiple times a day and this end up killing my tasks. I think I might be able to fix this by setting the keep an additional x days work setting to 1 from .25 hopefully this keeps enough of a buffer to prevent it from starting and stopping tasks all the time. |
37)
Message boards :
Theory Application :
Theory Task doing nothing
(Message 42621)
Posted 26 May 2020 by CloverField Post: This should work right? <app_config> <app> <name>Theory</name> <max_concurrent>28</max_concurrent> </app> <app> <name>ATLAS</name> <max_concurrent>2</max_concurrent> </app> </app_config> |
38)
Message boards :
Theory Application :
Theory Task doing nothing
(Message 42619)
Posted 26 May 2020 by CloverField Post: 2020-05-26 08:08:11 (19788): Error in stop VM for VM: -108 Command: VBoxManage -q controlvm "boinc_83115c7c7bfa4ba2" savestate Output: VBoxManage.exe: error: Machine 'boinc_83115c7c7bfa4ba2' is not currently running Im betting this is the problem it looks like it got interrupted by a bunch of new atlas tasks starting up. |
39)
Message boards :
Theory Application :
Theory Task doing nothing
(Message 42618)
Posted 26 May 2020 by CloverField Post: 700 sixtrack and 5 with Error are shown. This is ok. Got another one. I'm pretty sure its not my ram either. as I have more then enough. Even when running all the atlas tasks I still have usually around 20 GB free. Manged to get the task Id for this will. Will abort it and then check the error output. https://lhcathome.cern.ch/lhcathome/result.php?resultid=275253084 |
40)
Message boards :
Theory Application :
Theory Task doing nothing
(Message 42616)
Posted 26 May 2020 by CloverField Post: Ran only six track for the day everything was fine. Now switching back to all projects will report if this continues to be an issue with theory. |
©2024 CERN