Message boards : Number crunching : LHC multi-core WU take a very long time.
Message board moderation

To post messages, you must log in.

AuthorMessage
David Lambert

Send message
Joined: 2 Dec 05
Posts: 11
Credit: 2,607,594
RAC: 0
Message 35841 - Posted: 9 Jul 2018, 23:49:17 UTC
Last modified: 9 Jul 2018, 23:50:31 UTC

Whenever an ATLAS multi-core WU runs, it takes a very long time to process. The current WU elapsed time is 2d 09:44:43 and shows 01:10:11 estimated time remaining. Properties for the WU show that it is processing 1.800% per hour. However, the properties for the WU also show that it has used 5d 18:26:06 CPU time. This makes no sense to me. Can someone explain the discrepancy?
ID: 35841 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1176
Credit: 54,887,670
RAC: 5,761
Message 35842 - Posted: 9 Jul 2018, 23:59:30 UTC - in response to Message 35841.  

Whenever an ATLAS multi-core WU runs, it takes a very long time to process. The current WU elapsed time is 2d 09:44:43 and shows 01:10:11 estimated time remaining. Properties for the WU show that it is processing 1.800% per hour. However, the properties for the WU also show that it has used 5d 18:26:06 CPU time. This makes no sense to me. Can someone explain the discrepancy?


The reason we run multi-core is to run a single task using multiple processors and when finished you will see CPU time for the cores added together so the CPU time will be longer than the task running time.

Try some Theory multi-core tasks since they take less ram and usually are easier to run no matter how many cores you run.
ID: 35842 · Report as offensive     Reply Quote
David Lambert

Send message
Joined: 2 Dec 05
Posts: 11
Credit: 2,607,594
RAC: 0
Message 35843 - Posted: 10 Jul 2018, 0:01:04 UTC - in response to Message 35841.  

Whenever an ATLAS multi-core WU runs, it takes a very long time to process. The current WU elapsed time is 2d 09:44:43 and shows 01:10:11 estimated time remaining. Properties for the WU show that it is processing 1.800% per hour. However, the properties for the WU also show that it has used 5d 18:26:06 CPU time. This makes no sense to me. Can someone explain the discrepancy?


This is the text from the 'Properties' for the WU.

Application ATLAS Simulation 1.01 (vbox64_mt_mcore_atlas)
Name 851KDmpmbvsnyYickojUe11pABFKDmABFKDmaTkZDmABFKDmv6Pabn
State Running
Received 2018/07/07 02:07:47
Report deadline 2018/07/14 02:07:46
Resources 4 CPUs
Estimated computation size 43,200 GFLOPs
CPU time 5d 18:26:06
CPU time since checkpoint 00:05:15
Elapsed time 2d 09:44:43
Estimated time remaining 01:10:11
Fraction done 98.014%
Virtual memory size 139.04 MB
Working set size 6.05 GB
Directory slots/4
Process ID 3952
Progress rate 1.800% per hour
Executable vboxwrapper_26196_windows_x86_64.exe
ID: 35843 · Report as offensive     Reply Quote
David Lambert

Send message
Joined: 2 Dec 05
Posts: 11
Credit: 2,607,594
RAC: 0
Message 35844 - Posted: 10 Jul 2018, 0:02:59 UTC - in response to Message 35842.  
Last modified: 10 Jul 2018, 0:08:36 UTC

Whenever an ATLAS multi-core WU runs, it takes a very long time to process. The current WU elapsed time is 2d 09:44:43 and shows 01:10:11 estimated time remaining. Properties for the WU show that it is processing 1.800% per hour. However, the properties for the WU also show that it has used 5d 18:26:06 CPU time. This makes no sense to me. Can someone explain the discrepancy?


The reason we run multi-core is to run a single task using multiple processors and when finished you will see CPU time for the cores added together so the CPU time will be longer than the task running time.

Try some Theory multi-core tasks since they take less ram and usually are easier to run no matter how many cores you run.


So, that would mean disabling all other projects in my preferences, correct? I can't remember the last time I saw something other than ATLAS run even though I allow all projects.
ID: 35844 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1176
Credit: 54,887,670
RAC: 5,761
Message 35847 - Posted: 10 Jul 2018, 7:01:40 UTC - in response to Message 35844.  
Last modified: 10 Jul 2018, 7:04:58 UTC

YES, just go into your account page here and depending on where you have the *Location* set for a particular computer you then go to *Preferences for this project* and there scroll to the *Location* you use for the particular pc you want to set and there check the box next to *Theory Simulation* and make sure all the others are unchecked.

Below that you can *Edit Preferences* pick how many tasks you want to load and have ready to run you can pick a number or like some members do when they have 8 or more cores you can pick Max # jobs No limit or any of the numbers and below that you can see that you can have these multi-core tasks run the certain number of cores on each task (2 is the best setting) but I have tested all the versions 1 through 8 cores per task when the testing was being done.

After you have the settings you like click on *Update Preferences* and then on your Boinc Manager just tell it to Allow New Tasks and Update.

When you want to check what is actually happening as a task is running you can got to your VB Manager and there right click on a running task and then click on the Log and scroll through that and get some idea of what is happening and after a while you will know just how it should be running and what it says when there are problems.

Also as a task starts you will see on your Boinc Manager where you can just click on a running task and on the left of the page you can click on the *Show VM Console* and a black box will load on the page and you will be able to watch it start running and go through the start of the things it has to do before it will actually start running a task.

At the bottom of that box you will see *HTCondor Ping* and then *0* and that means it will actually then start running the "Jobs" (it can take a few minutes after starting before it gets that far)

Once you do this several times you will get used to this and it will get easier and easier.

That *Properties" that you looked at on the left of the Boinc Manager is just showing estimates and other things that you do not ever need to check.

And if you want to look at Valid or Error tasks you go to your Profile account here and click on Tasks (View) and there on the far left next to a finished task is the link to what is called the *Stderr* and there you will see exactly what that tasks did from start to finish.

Good luck
ID: 35847 · Report as offensive     Reply Quote

Message boards : Number crunching : LHC multi-core WU take a very long time.


©2024 CERN