Message boards : ATLAS application : ATLAS multi-core
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,504,665
RAC: 3,862
Message 34383 - Posted: 15 Feb 2018, 4:11:45 UTC

Well I decided to try some here on my best computer (I have done thousands of the different Atlas in alpha and beta)

The first 2 here worked ok in about 2hrs each (8-core tasks) and have done lots of them in alpha but as usual my 3rd one decided to take 8hrs to get to 99.983% and just dragging its Atlas feet in the dark matter and most likely about to get aborted so I can try another one (and I decided to update VB to the newest version that is just one version newer than the one I had already)


Volunteer Mad Scientist For Life
ID: 34383 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 144
Credit: 6,301,268
RAC: 0
Message 34433 - Posted: 21 Feb 2018, 1:00:47 UTC - in response to Message 34383.  

8hrs to get to 99.983% and just dragging its Atlas feet


Don't abort those. They'll almost certainly complete and verify safely. Many ATLAS will run for over 24 hours and a very few for over 2 days.

Done a lot of experimenting with configs, and all the WU core counts from 1 to 8, over the last 3 months and here are some of the upper limits of WU I've gotten:
BTW, 4 cores are optimal from my data as was reported in an earlier post on optimal core counts.

8 cores (117511.21 seconds):
158518582	76136035	2 Oct 2017, 21:59:55 UTC	6 Oct 2017, 22:13:00 UTC	Completed and validated	117511.21	601975	353.08	ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64


4 cores (197182.2 seconds, high credit):
168867864	81727868	9 Dec 2017, 7:38:08 UTC	12 Dec 2017, 12:20:34 UTC	Completed and validated	197182.2	232063.5	5492.62	ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64


4 cores (217693.71 seconds, low credit):
168640849	81602634	4 Dec 2017, 11:34:46 UTC	10 Dec 2017, 18:21:53 UTC	Completed and validated	217693.71	197709.2	857.05	ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64


4 cores record holder (330805.33 seconds or over 3 days):
168867893	81727709	9 Dec 2017, 7:38:08 UTC	17 Dec 2017, 7:17:18 UTC	Completed and validated	330805.33	561754.2	4551.5	ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64


1 core (126222.36 seconds):
158790822	76294126	8 Oct 2017, 11:36:17 UTC	12 Oct 2017, 16:44:54 UTC	Completed and validated	126222.36	126713.2	144.44	ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64


2 cores (136466.87 seconds):
168748377	81677587	6 Dec 2017, 11:40:09 UTC	8 Dec 2017, 2:17:48 UTC	Completed and validated	136466.87	152899.8	3332.59	ATLAS Simulation v1.01 (vbox64_mt_mcore_atlas) windows_x86_64
ID: 34433 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,491,159
RAC: 104,616
Message 34437 - Posted: 21 Feb 2018, 6:15:18 UTC

I guess nobody will ever be able to explain the big difference in credit points.
ID: 34437 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,504,665
RAC: 3,862
Message 34450 - Posted: 22 Feb 2018, 1:38:25 UTC - in response to Message 34433.  

8hrs to get to 99.983% and just dragging its Atlas feet


Don't abort those. They'll almost certainly complete and verify safely. Many ATLAS will run for over 24 hours and a very few for over 2 days.

Done a lot of experimenting with configs, and all the WU core counts from 1 to 8, over the last 3 months and here are some of the upper limits of WU I've gotten:
BTW, 4 cores are optimal from my data as was reported in an earlier post on optimal core counts.


Oh I have done hundreds of Atlas multicore tasks in 1-core,2-core,3-core,4-core,6-core,and 8-core since they started since I do the Alpha tests before they get here.

But many times the will sit there at 100% and keep running for as long as you let them and never actually finish and get sent back and other than Abort they will become a Computation error after many hours like this one https://lhcathome.cern.ch/lhcathome/result.php?resultid=177850942

I have a 4-core Atlas alpha running right now at 100% progress for close to 11 hours and another thing they do is like I mentioned in my previous post is they will run normal speed up to 99.900% and the after that each .001% takes as long as an hour.

I have seen many of those (not here at LHC) and you can check your CPU stats and it will be doing nothing.......yet they will still say they are running on the Boinc Manager.

But in the same batch 4 finished Valid https://lhcathome.cern.ch/lhcathome/results.php?userid=5472&offset=0&show_names=0&state=0&appid=14

And checking the VB log it is running and looks like it will continue as long as I let it run and just stays at 100% progress and running.

(the alpha tests have been having the problem for over a month as is LHC-dev)

All the other Cern project multi-cores have been running ok at -dev for a long time too.
ID: 34450 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 144
Credit: 6,301,268
RAC: 0
Message 34534 - Posted: 4 Mar 2018, 9:49:37 UTC - in response to Message 34450.  


I have seen many of those (not here at LHC) and you can check your CPU stats and it will be doing nothing.......yet they will still say they are running on the Boinc Manager.

And checking the VB log it is running and looks like it will continue as long as I let it run and just stays at 100% progress and running.
.


Are the VM's aborted or sitting in a reset state in your VBox Manager?

I haven't seen that happen in ATLAS, but it was a 2% error rate in Theory where the BOINC manager would show running 100% progress while VBox Manager would show the WU in a reset state.

ATLAS jobs are generally finishing in about 60,000 seconds but a 3 of the last 15 (20%) took 130k, 168k and 185k seconds till completion.

The one that ran 259,213 seconds didn't validate, so maybe they have set an upper limit of 3 days since that record setter some months back.
ID: 34534 · Report as offensive     Reply Quote
marmot
Avatar

Send message
Joined: 5 Nov 15
Posts: 144
Credit: 6,301,268
RAC: 0
Message 34535 - Posted: 4 Mar 2018, 10:50:06 UTC - in response to Message 34534.  

OK, checked server Olmec and it had an ATLAS that timed out.
Had a run time of 8+ days.
Opened the VM and it was still running, responsive to key strokes and using CPU in the process manager, but it's not going to validate.
ID: 34535 · Report as offensive     Reply Quote

Message boards : ATLAS application : ATLAS multi-core


©2024 CERN