Message boards :
ATLAS application :
Just more of the same failures
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,246,007 RAC: 1,159 |
I doubt this is caused by LHC/ATLAS. Yes, I know wingman run it ok. That is what is frustrating. However, the cleaning was much needed. Lots of leftovers and other problems identified and solved. I put app_config.xml back in play. For some reason my system likes this. So if it works, leave it. Memory at 6600 seems to solve the problem of stalling. I'll let ATLAS run alone for 24hrs and then add back CMS and Theory slowly. I guess I need to do a deep clean at least every 2 weeks. I have been a bit lazy about this lately, being off work and all. Thanks for all the help and I will come back to this thread again if something goes wrong for reference. Keep your fingers and toes crossed that all this solves the problem. I would love to get a success rate that exceeds the failure rate for once. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,246,007 RAC: 1,159 |
29 Feb 2020, 9:42:08 UTC Fertig und Bestätigt 7,839.57 27,598.06 229.54 ATLAS Simulation v2.00 (vbox64_mt_mcore_atlas) windows_x86_64 First error at that time: - Provider [ Name] Microsoft-Windows-DistributedCOM [ Guid] {1B562E86-B7AA-4131-BADC-B6F3A001407E} [ EventSourceName] DCOM - EventID 10016 [ Qualifiers] 0 Version 0 Level 3 Task 0 Opcode 0 Keywords 0x8080000000000000 - TimeCreated [ SystemTime] 2020-01-13T08:42:23.019385200Z EventRecordID 44033 - Correlation [ ActivityID] {2a518cdf-ef20-4ff1-9fce-ec923f337462} - Execution [ ProcessID] 1224 [ ThreadID] 11608 Channel System Computer DESKTOP-LFM92VN - Security [ UserID] S-1-5-19 - EventData param1 application-specific param2 Local param3 Activation param4 {6B3B8D23-FA8D-40B9-8DBD-B950333E2C52} param5 {4839DDB7-58C2-48F5-8283-E1D1807D0D7D} param6 NT AUTHORITY param7 LOCAL SERVICE param8 S-1-5-19 param9 LocalHost (Using LRPC) param10 Unavailable param11 Unavailable But DistributedCOM has been blowing up a lot for a long time. It appears to start complaining after a Windows Update. EventID 10016 is the common theme in all the errors starting back on the 12th. This stuff shows as Warning. There is a DCOM ReaderNotificationClient error a few times on the 12 and 13th. These shows at actual errors. A random chosen warning shows this: Log Name: System Source: Microsoft-Windows-DistributedCOM Date: 14/01/2020 12:50:52 Event ID: 10016 Task Category: None Level: Warning Keywords: Classic User: LOCAL SERVICE Computer: DESKTOP-LFM92VN Description: The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID {6B3B8D23-FA8D-40B9-8DBD-B950333E2C52} and APPID {4839DDB7-58C2-48F5-8283-E1D1807D0D7D} to the user NT AUTHORITY\LOCAL SERVICE SID (S-1-5-19) from address LocalHost (Using LRPC) running in the application container Unavailable SID (Unavailable). This security permission can be modified using the Component Services administrative tool. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-Windows-DistributedCOM" Guid="{1B562E86-B7AA-4131-BADC-B6F3A001407E}" EventSourceName="DCOM" /> <EventID Qualifiers="0">10016</EventID> <Version>0</Version> <Level>3</Level> <Task>0</Task> <Opcode>0</Opcode> <Keywords>0x8080000000000000</Keywords> <TimeCreated SystemTime="2020-01-14T11:50:52.138834300Z" /> <EventRecordID>45508</EventRecordID> <Correlation ActivityID="{a2588d91-dfe8-4366-a385-eea323282b05}" /> <Execution ProcessID="1224" ThreadID="15920" /> <Channel>System</Channel> <Computer>DESKTOP-LFM92VN</Computer> <Security UserID="S-1-5-19" /> </System> <EventData> <Data Name="param1">application-specific</Data> <Data Name="param2">Local</Data> <Data Name="param3">Activation</Data> <Data Name="param4">{6B3B8D23-FA8D-40B9-8DBD-B950333E2C52}</Data> <Data Name="param5">{4839DDB7-58C2-48F5-8283-E1D1807D0D7D}</Data> <Data Name="param6">NT AUTHORITY</Data> <Data Name="param7">LOCAL SERVICE</Data> <Data Name="param8">S-1-5-19</Data> <Data Name="param9">LocalHost (Using LRPC)</Data> <Data Name="param10">Unavailable</Data> <Data Name="param11">Unavailable</Data> </EventData> </Event> This was the 19th Today at 1800+ Log Name: System Source: Microsoft-Windows-DistributedCOM Date: 29/02/2020 18:08:23 Event ID: 10016 Task Category: None Level: Warning Keywords: Classic User: DESKTOP-LFM92VN\Greg Computer: DESKTOP-LFM92VN Description: The application-specific permission settings do not grant Local Activation permission for the COM Server application with CLSID {2593F8B9-4EAF-457C-B68A-50F6B8EA6B54} and APPID {15C20B67-12E7-4BB6-92BB-7AFF07997402} to the user DESKTOP-LFM92VN\Greg SID (S-1-5-21-630949258-3761359405-375428836-1001) from address LocalHost (Using LRPC) running in the application container Unavailable SID (Unavailable). This security permission can be modified using the Component Services administrative tool. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-Windows-DistributedCOM" Guid="{1B562E86-B7AA-4131-BADC-B6F3A001407E}" EventSourceName="DCOM" /> <EventID Qualifiers="0">10016</EventID> <Version>0</Version> <Level>3</Level> <Task>0</Task> <Opcode>0</Opcode> <Keywords>0x8080000000000000</Keywords> <TimeCreated SystemTime="2020-02-29T17:08:23.919648000Z" /> <EventRecordID>71748</EventRecordID> <Correlation ActivityID="{8c4a383f-ac9a-47a2-aa6b-aa12d52a6bb2}" /> <Execution ProcessID="1144" ThreadID="1184" /> <Channel>System</Channel> <Computer>DESKTOP-LFM92VN</Computer> <Security UserID="S-1-5-21-630949258-3761359405-375428836-1001" /> </System> <EventData> <Data Name="param1">application-specific</Data> <Data Name="param2">Local</Data> <Data Name="param3">Activation</Data> <Data Name="param4">{2593F8B9-4EAF-457C-B68A-50F6B8EA6B54}</Data> <Data Name="param5">{15C20B67-12E7-4BB6-92BB-7AFF07997402}</Data> <Data Name="param6">DESKTOP-LFM92VN</Data> <Data Name="param7">Greg</Data> <Data Name="param8">S-1-5-21-630949258-3761359405-375428836-1001</Data> <Data Name="param9">LocalHost (Using LRPC)</Data> <Data Name="param10">Unavailable</Data> <Data Name="param11">Unavailable</Data> </EventData> </Event> But all this is before I deep cleaned my system. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,246,007 RAC: 1,159 |
Seriously! 1 hour and 52 mins or so and it blows up with a memory error? Come on! 'lse, errorID=HostMemoryLow message="Unable to allocate and lock memory. The virtual machine will be paused. Please close applications to free up memory or close the VM So..last attempt.... Used REVO Uninstall to get rid of 6.0.18 and all related registry keys. Reinstalled the latest VM Put back app_config.xml Current task is advancing in early stages at .2500% per 2 seconds and using 30% CPU. As for previous discussion about heat, well that's taken care of on the CPU side with a really good radiator cooling system that is in pull form. This is a new case that is designed for gaming, 2 huge intake fans, plus 1 standard fan on the bottom blowing in cool air. 1 exhaust plus related open back for blowing out hot air. Power supply..I have a digital power supply of 650 watts of which I use only 460 watts of power. Max 190 watts CPU Max 165 GPU and system takes 116 max. PSU temp only 53C. To me this is really just a freaking software problem between ATLAS and VBOX. I really have no idea what more to do. Windows kicked up a message about security needing to know if it was ok to let VBOX headless do its thing. I said yes of course allowing it all the access it needs. I really hope this solves the issues. If it does I am not messing with any more configuration things. If app_config.xml makes the system happy, so be it. |
Send message Joined: 2 May 07 Posts: 2136 Credit: 160,333,159 RAC: 31,248 |
If you have a ASUS Mainboard, there is a good tool AI Suite3 to check and tune your system. Also is cpu-z ok to see if your memory have a problem. It's better to run Theory first and if this is ok, to test CMS or Atlas. Have the same hardware twice without no problems for Atlas. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,246,007 RAC: 1,159 |
Things are ok now. It's stable. So latest Vbox and extensions then a app_config.xml with the followiing: task download 1, run 1, cpu 4, memory 6600. I am not changing this. Thanks for the support. I am not using a ASUS board, I am using MSI. I rarely use any of the extra stuff from MSI, I do not have any need for it. But I will run the program to see if there are any other critical updates needed. I have been running MSI for years, after a little tweaking they are not a problem. It's not a MOBO issue though. Or memory, This has been just a software problem with BOINC and VBOX fighting. I had to find the right combo and settings to get them to stop fighting. |
Send message Joined: 15 Jun 08 Posts: 2439 Credit: 229,965,991 RAC: 135,552 |
This has been just a software problem with BOINC and VBOX fighting. As I already wrote: BOINC, ATLAS and VBOX don't fight on hundreds of other computers, so why should they do only on your's? It's more likely a homemade issue, if not caused by the hardware then maybe by too much registry tweaking or using the wrong tools for monitoring. ATLAS tasks are starting fine. That's what the logs show. Then, when they come to a point where they need more RAM, VBOX can't allocate it since it is locked by another process. This does not affect all of your tasks as a couple of valids show but those valids also show that ATLAS is able to run fine if it gets all required resources. That RAM locking process has to be identified to solve the problem. |
Send message Joined: 2 May 07 Posts: 2136 Credit: 160,333,159 RAC: 31,248 |
Can it be, that the combination with TWO Nvidia-Graphs is a problem? Do you run Nvidia Tasks under Boinc too? |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,246,007 RAC: 1,159 |
This has been just a software problem with BOINC and VBOX fighting. Ok..but how does one sort out the RAM locking issue? That is above my understanding. And if things start needing more than 24 gigs, then I drop a project. I have two that I am dedicated to because they come from where I used to live. So Rosetta and Einstein I will keep, but stuff like Asteroids and perhaps Milkyway can go if needed. Or I just get some larger sticks of RAM later if that solves the issue. I have two old sticks that I keep rolling over from build to build to save money. This would be the last upgrade I do on the system for awhile if needed. I think that there was to much leftover crap from Windows and all the installs and uninstalls of BOINC and VBOX. There were over 700 issues that needed to be fixed by WISE. Now that the system has been deep cleaned it's working properly again. That's all I know. And you do have to deep clean your system every so often. So I will look at deep cleaning again in 4-6 weeks. And as I said, for regular light maintenance cleaning, it has been told to exclude the BOINC folder. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,246,007 RAC: 1,159 |
Can it be, that the combination with TWO Nvidia-Graphs is a problem? Why would that be a problem? I have been running side by side Nvidia GPU's for years. I had a 1050 and a 960 before I got the 1080. The 1080 has been running just fine for a long time now. And I personally wouldn't know how GPU's on their own cores would affect just one process that does not use GPU and not interfere with other CPU + VM jobs. Only ATLAS is coughing, not any of the other projects that use VM. I think it is more like the other guy says, something to do with memory locking, but that is a topic I am not familiar with, so will need more information. That was an issue for a time and also low memory for some reason. But with the forced change to max 6600MB that dropped my RAM load by 25%. Now I max out in the high 70s to low 80% of my 24GB of RAM. As far as memory usage goes, right now after ATLAS only 2 Rosetta tasks come close in usage. One that maxes out in the 1200 range and another that is only in the 450-470 range. And since I made the adjustments in app_config.xml and left it in place and got the right version of VBOX and BOINC, things are chugging along just nicely with 2 NVIDIA cards running. I am not about to go messing around with things if they work now. It's hands off on things for ATLAS. It's happy, I'm happy, so just leave it be. |
Send message Joined: 15 Jun 08 Posts: 2439 Credit: 229,965,991 RAC: 135,552 |
I have two old sticks that I keep rolling over from build to build to save money. You have 24 GB RAM, right? How many RAM sticks do you have and how do you populate your MB? How many RAM slots are used and how many are free? Do all your RAM sicks currently in use have the same size and the same specs? Are all RAM sticks from the same manufacturer? |
Send message Joined: 2 May 07 Posts: 2136 Credit: 160,333,159 RAC: 31,248 |
Yes, this combination from Hardware and GPU's can be something special. What's about, Nvidia-GPU's block the RAM for it's own work? |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,246,007 RAC: 1,159 |
Yes, Well they would only take what is needed to run the project. But none of my GPU projects consume as much RAM as ATLAS does. Rosetta was drawing more RAM than the GPU's were last time I looked. Right now Moo Wrapper and Prime Grid are running, but they have a combined use of just 117MB. They are more processor heavy than RAM heavy. Prime is a number search, so no real graphics to speak of. Moo is the same thing. Heavy equations but not graphics. |
Send message Joined: 28 Dec 08 Posts: 318 Credit: 4,246,007 RAC: 1,159 |
I have two old sticks that I keep rolling over from build to build to save money. Rammon report. Every detail you would want to know https://drive.google.com/open?id=1pWh86MdxvSPCKK0-xuFM9LnME9_64cfp 2 x Patriot 4096 each (old sticks) 1x Kingston 1x CORSAIR both 8192 All PC4 17000 Any other details just look at the report. |
©2024 CERN