Message boards :
ATLAS application :
ATLAS multi-core
Message board moderation
Author | Message |
---|---|
Send message Joined: 24 Oct 04 Posts: 1114 Credit: 49,504,665 RAC: 3,862 |
Well I decided to try some here on my best computer (I have done thousands of the different Atlas in alpha and beta) The first 2 here worked ok in about 2hrs each (8-core tasks) and have done lots of them in alpha but as usual my 3rd one decided to take 8hrs to get to 99.983% and just dragging its Atlas feet in the dark matter and most likely about to get aborted so I can try another one (and I decided to update VB to the newest version that is just one version newer than the one I had already) Volunteer Mad Scientist For Life |
Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 |
8hrs to get to 99.983% and just dragging its Atlas feet Don't abort those. They'll almost certainly complete and verify safely. Many ATLAS will run for over 24 hours and a very few for over 2 days. Done a lot of experimenting with configs, and all the WU core counts from 1 to 8, over the last 3 months and here are some of the upper limits of WU I've gotten: BTW, 4 cores are optimal from my data as was reported in an earlier post on optimal core counts. 8 cores (117511.21 seconds): 158518582 76136035 2 Oct 2017, 21:59:55 UTC 6 Oct 2017, 22:13:00 UTC Completed and validated 117511.21 601975 353.08 ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64 4 cores (197182.2 seconds, high credit): 168867864 81727868 9 Dec 2017, 7:38:08 UTC 12 Dec 2017, 12:20:34 UTC Completed and validated 197182.2 232063.5 5492.62 ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64 4 cores (217693.71 seconds, low credit): 168640849 81602634 4 Dec 2017, 11:34:46 UTC 10 Dec 2017, 18:21:53 UTC Completed and validated 217693.71 197709.2 857.05 ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64 4 cores record holder (330805.33 seconds or over 3 days): 168867893 81727709 9 Dec 2017, 7:38:08 UTC 17 Dec 2017, 7:17:18 UTC Completed and validated 330805.33 561754.2 4551.5 ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64 1 core (126222.36 seconds): 158790822 76294126 8 Oct 2017, 11:36:17 UTC 12 Oct 2017, 16:44:54 UTC Completed and validated 126222.36 126713.2 144.44 ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64 2 cores (136466.87 seconds): 168748377 81677587 6 Dec 2017, 11:40:09 UTC 8 Dec 2017, 2:17:48 UTC Completed and validated 136466.87 152899.8 3332.59 ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64 |
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,491,159 RAC: 104,616 |
I guess nobody will ever be able to explain the big difference in credit points. |
Send message Joined: 24 Oct 04 Posts: 1114 Credit: 49,504,665 RAC: 3,862 |
8hrs to get to 99.983% and just dragging its Atlas feet Oh I have done hundreds of Atlas multicore tasks in 1-core,2-core,3-core,4-core,6-core,and 8-core since they started since I do the Alpha tests before they get here. But many times the will sit there at 100% and keep running for as long as you let them and never actually finish and get sent back and other than Abort they will become a Computation error after many hours like this one https://lhcathome.cern.ch/lhcathome/result.php?resultid=177850942 I have a 4-core Atlas alpha running right now at 100% progress for close to 11 hours and another thing they do is like I mentioned in my previous post is they will run normal speed up to 99.900% and the after that each .001% takes as long as an hour. I have seen many of those (not here at LHC) and you can check your CPU stats and it will be doing nothing.......yet they will still say they are running on the Boinc Manager. But in the same batch 4 finished Valid https://lhcathome.cern.ch/lhcathome/results.php?userid=5472&offset=0&show_names=0&state=0&appid=14 And checking the VB log it is running and looks like it will continue as long as I let it run and just stays at 100% progress and running. (the alpha tests have been having the problem for over a month as is LHC-dev) All the other Cern project multi-cores have been running ok at -dev for a long time too. |
Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 |
Are the VM's aborted or sitting in a reset state in your VBox Manager? I haven't seen that happen in ATLAS, but it was a 2% error rate in Theory where the BOINC manager would show running 100% progress while VBox Manager would show the WU in a reset state. ATLAS jobs are generally finishing in about 60,000 seconds but a 3 of the last 15 (20%) took 130k, 168k and 185k seconds till completion. The one that ran 259,213 seconds didn't validate, so maybe they have set an upper limit of 3 days since that record setter some months back. |
Send message Joined: 5 Nov 15 Posts: 144 Credit: 6,301,268 RAC: 0 |
OK, checked server Olmec and it had an ATLAS that timed out. Had a run time of 8+ days. Opened the VM and it was still running, responsive to key strokes and using CPU in the process manager, but it's not going to validate. |
©2024 CERN