Message boards : Theory Application : Theory not utilizing all cores
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,137
RAC: 3,956
Message 36603 - Posted: 1 Sep 2018, 20:54:42 UTC - in response to Message 36596.  

Magic, could you run an 8-core Theory once again on one of your i7-3770's to check it's running 8 jobs within the VM or only max 4?

I will try that again as soon as one of them are ready for new work.
But I just remembered something I tested here a couple months ago after we started running multi-core Theory here.

I was running on these 8-core pc's three X2 tasks and 3 single LHCb at the same time which means in some way I had 9 tasks running on an 8-core (sort of) so it must mean that these multi-core Theory tasks still allowed me to run those 3 LHCb tasks at the same time.

I stopped running them that way just because I would have to always start the three X2 Theory tasks first and then the three LHCb tasks.

Here is a snapshot I took of this back in the first week of July
ID: 36603 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36604 - Posted: 1 Sep 2018, 23:25:17 UTC

@ Luigi R
Interesting bash script and graph from you.
Your troubles may be just a Linux thing. I have https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10557960 on Linux (Ubuntu) on which I can test an 8 core Theory. It's showing only 6 cores now because I have HT turned off for an experiment on how HT affects performance on ATLAS native tasks. When current tasks complete I'll set it up for an 8 core Theory and we'll see what the graph from my host looks like.
ID: 36604 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,180,005
RAC: 0
Message 36607 - Posted: 2 Sep 2018, 6:35:48 UTC - in response to Message 36604.  
Last modified: 2 Sep 2018, 6:39:12 UTC

Ok, that task was long. It was reported some while ago. I have an almost complete graph.

Graph

Task
https://lhcathome.cern.ch/lhcathome/result.php?resultid=206334370

On night CPU usage was very good.
8 threads were used.


@ Luigi R
Interesting bash script and graph from you.
Your troubles may be just a Linux thing. I have https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10557960 on Linux (Ubuntu) on which I can test an 8 core Theory. It's showing only 6 cores now because I have HT turned off for an experiment on how HT affects performance on ATLAS native tasks. When current tasks complete I'll set it up for an 8 core Theory and we'll see what the graph from my host looks like.

Let us know. ;)
ID: 36607 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 36608 - Posted: 2 Sep 2018, 8:21:23 UTC - in response to Message 36607.  

On night CPU usage was very good.
8 threads were used.

But not all the time. There were several gaps between finish and start of a job.
CPU efficiency 4.5 out of 8 available threads.
ID: 36608 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 36609 - Posted: 3 Sep 2018, 5:41:37 UTC

I retested the 8-core VM (https://lhcathome.cern.ch/lhcathome/result.php?resultid=206346600) and this time the result was much better.
All 8 cores/VM-slots were used. The first 2 jobs were still running before the next 2 have to start after 20 minutes.
After the initially setup where the jobs are started pairwise with an interval of about 20 minutes and
before the jobs are finishing one after another without using the freed cores, the efficiency of the 3 VBoxHeadless processes was 83.9%
Average CPU	MHz used now	Total CPU cycles

83.90444591 %	22359.82	1.05231269844286E+15
0.000013645 %	0		171140926
0.000009997 %	0		125386186


Average total:	83.904469555044 %
Average MHz:	22895.85

Current load:	81.9401390906526 %

Also after the finish of a job, a new job started shortly after in that freed slot.
E.g:
09:42:16 +0200 2018-09-02 [INFO] New Job Starting in slot1
13:04:17 +0200 2018-09-02 [INFO] Job finished in slot1 with 0.
13:04:42 +0200 2018-09-02 [INFO] New Job Starting in slot1
17:02:37 +0200 2018-09-02 [INFO] Job finished in slot1 with 0.
17:03:00 +0200 2018-09-02 [INFO] New Job Starting in slot1
18:03:57 +0200 2018-09-02 [INFO] Job finished in slot1 with 0.
18:04:25 +0200 2018-09-02 [INFO] New Job Starting in slot1
20:16:31 +0200 2018-09-02 [INFO] Job finished in slot1 with 0.
20:16:56 +0200 2018-09-02 [INFO] New Job Starting in slot1
21:15:26 +0200 2018-09-02 [INFO] Job finished in slot1 with 0.
21:15:40 +0200 2018-09-02 [INFO] New Job Starting in slot1
00:02:27 +0200 2018-09-03 [INFO] Job finished in slot1 with 0.

Towards the end CPU-cycles are wasted because of not used cores.
22:16:49 +0200 2018-09-02 [INFO] Job finished in slot5 with 0.
22:34:31 +0200 2018-09-02 [INFO] Job finished in slot4 with 0.
23:36:01 +0200 2018-09-02 [INFO] Job finished in slot2 with 0.
00:00:05 +0200 2018-09-03 [INFO] Job finished in slot8 with 0.
00:02:27 +0200 2018-09-03 [INFO] Job finished in slot1 with 0.
00:05:55 +0200 2018-09-03 [INFO] Job finished in slot6 with 0.
01:32:02 +0200 2018-09-03 [INFO] Job finished in slot7 with 0.
02:06:56 +0200 2018-09-03 [INFO] Job finished in slot3 with 0.
ID: 36609 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,960,854
RAC: 136,903
Message 36610 - Posted: 3 Sep 2018, 6:24:31 UTC - in response to Message 36609.  

The link leads to a 4-core Theory VM.
The timestamps in your examples also don't match the linked log.

https://lhcathome.cern.ch/lhcathome/result.php?resultid=206346600
2018-09-02 09:32:45 (4976): Setting CPU Count for VM. (4)
ID: 36610 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 36611 - Posted: 3 Sep 2018, 7:11:37 UTC - in response to Message 36610.  

The link leads to a 4-core Theory VM.
The timestamps in your examples also don't match the linked log.

I changed directly after the start the VM-settings to 8 cores and restarted the VM:
2018-09-02 09:33:57 (4976): Stopping VM.
2018-09-02 09:43:05 (2412): Detected: vboxwrapper 26197
2018-09-02 09:43:05 (2412): Detected: BOINC client v7.7

The times from the VM and the host will differ after several hours of runtime. I took the times from the VM-logs and not stderr.txt.
As you can see slot 1 up to 8 were used and the used cpu time and elapsed time shows more CPU use than a 4 core can give:
Elapsed 16 hours 38 minutes 34 seconds
CPU time 4 days 5 hours 2 minutes 15 seconds
ID: 36611 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,137
RAC: 3,956
Message 36612 - Posted: 3 Sep 2018, 7:38:42 UTC

computezrmle I am starting one of the 8-core tasks on one of my i7-3770's in a couple seconds.

.
ID: 36612 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,960,854
RAC: 136,903
Message 36613 - Posted: 3 Sep 2018, 7:50:00 UTC - in response to Message 36611.  

Thank you, it's clearer now.

I agree 100% with the conclusions.
Cycle thefts are:
1. The staggered startup (20 min delay)
2. The job's runtime differences at the end of the WU

BTW:
Time drifts should not occur as every VM starts it's own NTP service.
You may consider to check if your firewall allows traffic from your VMs to external UDP port 123 and back.
Faster and more elegant would be to redirect the NTP requests to an already existing time source inside your LAN.
Some firewalls/routers provide functions to do those redirects.
ID: 36613 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,137
RAC: 3,956
Message 36615 - Posted: 3 Sep 2018, 8:17:47 UTC - in response to Message 36613.  

While I am watching the Log of the 8-core task I started I took a quick snapshot of the 3-core task I already have been running on the pc next to it.
These 3 jobs started in 22 minutes
ID: 36615 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,960,854
RAC: 136,903
Message 36616 - Posted: 3 Sep 2018, 8:34:27 UTC - in response to Message 36615.  

It's an LHCb, not a Theory.
Would be nice to see the timestamps.
Is there a delay between the first 2 and the 3rd?
ID: 36616 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,137
RAC: 3,956
Message 36617 - Posted: 3 Sep 2018, 8:41:42 UTC - in response to Message 36616.  
Last modified: 3 Sep 2018, 8:50:10 UTC

Yes I figured you would know what that was so I just wanted to show you how they are doing while I watch the 8-core here.
That LHCb went 1-3-2 slots in 22 minutes.

Here is the 8-core after one hour with time.



As you can see it was 11 minutes before the first job started

(its 2am so I better go fake sleep and I'll check back later Stefan)
ID: 36617 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 36618 - Posted: 3 Sep 2018, 9:51:32 UTC - in response to Message 36617.  
Last modified: 3 Sep 2018, 15:01:41 UTC

@MAGIC:
Interesting is that the 3rd job not started after about 20 minutes, maybe because of the quick finishes of job 1 and 2.
It seems that the staggered startup gets confused by that, what I already mentioned before.
ID: 36618 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36620 - Posted: 3 Sep 2018, 16:20:51 UTC - in response to Message 36618.  

@MAGIC:
Interesting is that the 3rd job not started after about 20 minutes, maybe because of the quick finishes of job 1 and 2.

Note that job 2 finished with 1 (error) rather than 0. If there is a confusion then maybe it's due to Condor pondering the cause of the error and what it should do next.

It seems that the staggered startup gets confused by that, what I already mentioned before.

But does the confusion cause it to fail to start jobs in all 8 slots? We can't see that from Magic's 1 hour Log excerpt.

I have an 8 core VBox Theory running here, https://lhcathome.cern.ch/lhcathome/result.php?resultid=206367015. The Log shows a quick finish of job 2 but it finished with 0 rather than 1. Jobs started in timely fashion in all 8 slots despite the early finish of job 2. You'll see it all in stderr output after task completion.
ID: 36620 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1114
Credit: 49,503,137
RAC: 3,956
Message 36627 - Posted: 3 Sep 2018, 21:38:31 UTC - in response to Message 36620.  

That 8-core task I tried late last night turned out to only run 1 hour 10 min 51 sec and Valid (I saw it this morning)
https://lhcathome.cern.ch/lhcathome/result.php?resultid=206368984

I just got home and checking my tasks it has been a strange day here today with my Theory tasks.

Many Valids that were also only run just over an hour (2-core multi's)

And a page full of the Invalids as usual that are those 27 minute runs ending with the typical......
VM Completion File Detected.
VM Completion Message: Condor exited after 985s without running a job.


Mixed in with several of the *normal* looking Valids (18 hours)
ID: 36627 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36629 - Posted: 4 Sep 2018, 1:09:06 UTC

Finished the 8 core Theory I started, https://lhcathome.cern.ch/lhcathome/forum_index.php. It began with several quick finishes which I don't understand. I've never seen a Theory sub-job finish in 20 minutes so something very strange going on there, maybe those quick finishes aren't the usual pythia/herwig/sherpa jobs, whatever. In spite of those quick finishes the task eventually revved up to a full 8 slots.
The disappointing bit is here:
2018-09-03 11:23:42 (2717): Guest Log: [INFO] Job finished in slot2 with 1.
2018-09-03 11:43:37 (2717): Guest Log: [INFO] Job finished in slot5 with 0.
2018-09-03 11:45:02 (2717): Guest Log: [INFO] Job finished in slot7 with 0.
2018-09-03 11:45:07 (2717): Guest Log: [INFO] Job finished in slot4 with 0.
2018-09-03 11:45:56 (2717): Guest Log: [INFO] Job finished in slot1 with 0.
2018-09-03 11:52:56 (2717): Guest Log: [INFO] Job finished in slot6 with 0.
2018-09-03 12:16:14 (2717): Guest Log: [INFO] Job finished in slot8 with 0.

where slot2 finishes with an error. Soon after every slot except for 3 also finishes and the task drags on for another 5 hours doing next to nothing (probably a useless looping sherpa) until it bumps up against the 18 hour limit at :
2018-09-03 17:24:51 (2717): Removing virtual disk drive from VirtualBox.


@Luiigi_R
I ran your logging script but didn't check its output until the task finished. It seems we have top configured differently. The 9th field in the output from my top is either S, Z, I or R, not %CPU. I should have changed the "awk '{print $9}' " to "awk '{print $10}' ". Oh well.
ID: 36629 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,180,005
RAC: 0
Message 36633 - Posted: 4 Sep 2018, 10:18:08 UTC - in response to Message 36629.  
Last modified: 4 Sep 2018, 11:14:24 UTC

@Luiigi_R
I ran your logging script but didn't check its output until the task finished. It seems we have top configured differently. The 9th field in the output from my top is either S, Z, I or R, not %CPU. I should have changed the "awk '{print $9}' " to "awk '{print $10}' ". Oh well.

I'm sorry not to have mentioned that. :(
I had had the same problem. Only once I needed (when I tested that script) the 9th field to get things working, usually I set the 10th field.

Meanwhile I tried to disable HT, so I have 4 cores now.
The best of two tasks is still a disappointing: click.
I need further testing.
ID: 36633 · Report as offensive     Reply Quote
Luigi R.
Avatar

Send message
Joined: 7 Feb 14
Posts: 99
Credit: 5,180,005
RAC: 0
Message 36634 - Posted: 4 Sep 2018, 17:38:06 UTC - in response to Message 36629.  

@Luiigi_R
I ran your logging script but didn't check its output until the task finished. It seems we have top configured differently. The 9th field in the output from my top is either S, Z, I or R, not %CPU. I should have changed the "awk '{print $9}' " to "awk '{print $10}' ". Oh well.

It looks there is another issue due to window dimension.

If your window is small, your process could be:

  • (nothing)
  • "VBoxHe+"
  • "VBoxHea+"
  • "VBoxHead+"
  • "VBoxHeadl+"
  • "VBoxHeadle+"
  • "VBoxHeadless"



I can't explain why sometimes "awk '{print $9}'" works too.

ID: 36634 · Report as offensive     Reply Quote
Profile MechaToaster
Avatar

Send message
Joined: 17 Aug 17
Posts: 15
Credit: 179,253
RAC: 0
Message 36752 - Posted: 18 Sep 2018, 9:53:09 UTC - in response to Message 36594.  

It's not what I'm looking for. I don't want 8 tasks, I want only 1 tasks that uses 8 cores!
...

You're right. The 8-core Theory VM is only using a max of 4 cores. I tested the 8-core VM and saw this:
08/31/18 22:37:36 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto
slot type 0: Cpus: 1.000000, Memory: 375, Swap: 25.00%, Disk: 25.00%
slot type 0: Cpus: 1.000000, Memory: 375, Swap: 25.00%, Disk: 25.00%
slot type 0: Cpus: 1.000000, Memory: 375, Swap: 25.00%, Disk: 25.00%
slot type 0: Cpus: 1.000000, Memory: 375, Swap: 25.00%, Disk: 25.00%
08/31/18 22:37:36 slot1: New machine resource allocated
08/31/18 22:37:36 Setting up slot pairings
08/31/18 22:37:36 slot2: New machine resource allocated
08/31/18 22:37:36 Setting up slot pairings
08/31/18 22:37:36 slot3: New machine resource allocated
08/31/18 22:37:36 Setting up slot pairings
08/31/18 22:37:36 slot4: New machine resource allocated
08/31/18 22:37:36 Setting up slot pairings
08/31/18 22:37:36 CronJobList: Adding job 'multicore'

Maybe only physical cores are counted (4 in my case). Have to be confirmed by someone with more physical cores.
Anyway, the most efficient way to run Theory when you have enough RAM (you have) is the single core VM.

im having a similar issue on a ryzen 5 1600x(6 core 12 threads). i had lhc@home configured to use 4 cores per task, i thought it would count threads as cores, but it will not let me run more than 1 theory task on 4 cores at a time, despite having enough RAM available. if i try to run another theory task, it stops and the status updates to "postponed: b".
ID: 36752 · Report as offensive     Reply Quote
Pavel Hanak

Send message
Joined: 5 Mar 06
Posts: 13
Credit: 30,870,563
RAC: 83
Message 36786 - Posted: 20 Sep 2018, 11:06:40 UTC
Last modified: 20 Sep 2018, 12:06:13 UTC

Hi all, one of my machines has 16c/32t AMD TR1950X and I've also noticed that the new 263.70 multicore app severely under-utilizes the CPU. What's even worse, all tasks I've watched failed at about halfway of the computation. So naturally, I want to try to limit the number of cores as suggested here, but limiting them globally via the web interface seems rather cumbersome to me. In fact, I fine-tune them according to core and memory capacity on each my machine. Isn't there a way to limit them for Theory via app_config.xml file, like it can be done for ATLAS?

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=35921

Anyway, at the moment I've "fixed" it by manually aborting all 263.70 tasks, in hopes that the announced 263.80 version will work better.
ID: 36786 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Theory Application : Theory not utilizing all cores


©2024 CERN