Message boards :
Theory Application :
Theory Task doing nothing
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
Ive gotten about four theory tasks today that seem to be nothing showing the vm console reveals this. Top shows that nothing is running. |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
I now have two more in my currently running tasks doing the exact same thing. |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
Work up to 4 more doing that this morning along with some atlas tasks doing nothing. Are there network problems at CERN? |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770 |
There must be something wrong with your Computer: You have a sixtrack with x86(32-bit) and this was not finished: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=139948215 1. Check your OS 2. let only sixtrack running (prefs). Edit: Sorry, there is a x86 Version running in sixtrack: Microsoft Windows (98 or later) running on an Intel x86-compatible CPU |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
There must be something wrong with your Computer: 1. Running 64 bit windows. 2. I will switch to six track only here in a moment. I don't think that task is blocking network connections off the top of my head sixtrack doesn't talk to the internet. I was also able to find the task running away happily. |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
Ran only six track for the day everything was fine. Now switching back to all projects will report if this continues to be an issue with theory. |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770 |
700 sixtrack and 5 with Error are shown. This is ok. You have 64 GByte RAM and needed to control your PC when you mix Atlas and Theory. Theory is not so difficult with the RAM as Atlas. You have 8 CPU for Atlas. It is useful to control Atlas with a app_config.xml and less CPU's than 8 or not so many Atlas-Tasks in use, because Atlas need a good control of the RAM. Therefore is in the Atlas-folder of LHCathome a lot of help how to use it. |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
700 sixtrack and 5 with Error are shown. This is ok. Got another one. I'm pretty sure its not my ram either. as I have more then enough. Even when running all the atlas tasks I still have usually around 20 GB free. Manged to get the task Id for this will. Will abort it and then check the error output. https://lhcathome.cern.ch/lhcathome/result.php?resultid=275253084 |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
2020-05-26 08:08:11 (19788): Error in stop VM for VM: -108 Command: VBoxManage -q controlvm "boinc_83115c7c7bfa4ba2" savestate Output: VBoxManage.exe: error: Machine 'boinc_83115c7c7bfa4ba2' is not currently running Im betting this is the problem it looks like it got interrupted by a bunch of new atlas tasks starting up. |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,473,093 RAC: 131,985 |
Indeed. Starting, pausing and restarting too many vbox tasks concurrently can result in an overloaded disk IO. You may try to limit at least the number of concurrent ATLAS starts as each of them copies a few GB. |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
This should work right? <app_config> <app> <name>Theory</name> <max_concurrent>28</max_concurrent> </app> <app> <name>ATLAS</name> <max_concurrent>2</max_concurrent> </app> </app_config> |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,473,093 RAC: 131,985 |
It's usually a minor problem to run many tasks concurrently but it can become a problem if they change their status. This happens if you start/restart your BOINC client or even at shutdown when lots of data has to be saved to disk. Modern computers with lots of cores are more affected as they run more tasks concurrently. Nobody can really tell what's the best combination on your computer. You'll have to try it out. |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
It's usually a minor problem to run many tasks concurrently but it can become a problem if they change their status. So this computer is my main server box all it does is LHC@home and every so often stream a movie to my tv. As such its configured to run boinc 100% and does not suspend when the computer is in use. All the start stops in that last theory task are actually from when boinc goes to fetch work. What usually happens there is it will get a bunch of atlas tasks back and since those have a earlier due date it will stop whatever is currently running and switch back to atlas, this happens multiple times a day and this end up killing my tasks. I think I might be able to fix this by setting the keep an additional x days work setting to 1 from .25 hopefully this keeps enough of a buffer to prevent it from starting and stopping tasks all the time. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,391 RAC: 2,363 |
All the start stopsThat's the hammer on the nail. Since the last Theory update the fictive estimated runtime went from 100 hours to 10 days. It would be the best solution if Laurence would fix this, but for the time being you may change it yourself by editing the Theory_2019_10_01.xml in LHC's project folder. Change the job_duration value from 864000 into 360000. |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
All the start stopsThat's the hammer on the nail. I already made that change on your advice over in number crunching. At the time I thought the issue was only limited to CMS tasks. However it seem getting 1 day of work and then reducing the buffer to .25 days has fixed the issue as it effectively stops boinc from getting new ATLAS tasks. I could probably get the same result with the no new work button. |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770 |
You have successfull Tasks for ATLAS, CMS and Theory in the last days. When you let only sixtrack and ONE Task with VM (ATLAS, CMS or Theory) running and all other VM-Tasks suspended. Is this Task running normal and finishing correct? There are many sixtrack for the other 31 CPU's atm. |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
You have successfull Tasks for ATLAS, CMS and Theory in the last days. Yeah this would also work. I've just kinda been more focused on trying to do as much work as fast as possible lol. Allowing each vm based task to run one instance and then filling the rest with six track would probably be the best way going forward. That or if I build a new computer and dedicate it to ATLAS only as it seems to be the problem child with its quick deadline dates. |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
Ok got another one that was just stuck there with the same message. This time it was not due to task switching. Could it be due to the squid cache that I set up earlier? Hopefully this will update to something more helpful then aborted by user. https://lhcathome.cern.ch/lhcathome/result.php?resultid=275990643 |
Send message Joined: 17 Oct 06 Posts: 74 Credit: 52,496,832 RAC: 39,541 |
Ok got another one that was just stuck there with the same message. Just restarted squid for ATLAS, I'll see if this fixes the theory issues as well. |
Send message Joined: 15 Jun 08 Posts: 2413 Credit: 226,473,093 RAC: 131,985 |
If it happens again you may consider a project reset to ensure you get a fresh theory vdi. |
©2024 CERN