Message boards : Number crunching : CPUs left unused???
Message board moderation

To post messages, you must log in.

AuthorMessage
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,905,247
RAC: 60,096
Message 38951 - Posted: 24 May 2019, 16:05:46 UTC
Last modified: 24 May 2019, 16:06:11 UTC

For example, an E5-2699v4 has 22c/44t with 4 GPU WUs. That leaves 40 threads for LHC WUs. I do not use virtualbox so I get WUs for sixtract, native theory & atlas. I'm not running any other CPU project. Yet it's only running 23 WUs and leaving 17 threads idle. I also have 1 CPU max set in preferences. It has 32 GB RAM and is using 17.5 GB now. I have not created an app_config file and my cc_client has <ncpus>-1</ncpus>.
Any idea why so many resources are not being used???
ID: 38951 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,401,842
RAC: 131,719
Message 38954 - Posted: 24 May 2019, 16:24:58 UTC - in response to Message 38951.  

Your BOINC client may be set to use not more than 55-60% of the available RAM.
Check this setting in your BOINC manager.
ID: 38954 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,905,247
RAC: 60,096
Message 38956 - Posted: 24 May 2019, 16:31:17 UTC - in response to Message 38954.  
Last modified: 24 May 2019, 16:32:19 UTC

Memory is set to 95, 95 & 95% with 100% CPU. Plenty of space left on the SSD.

Might there be an L3 Cache limitation??? An E5-2699v4 has 55 MB.
ID: 38956 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2411
Credit: 226,401,842
RAC: 131,719
Message 38957 - Posted: 24 May 2019, 16:42:35 UTC - in response to Message 38956.  

L3 Cache limitation

Nice joke!
;-D


Plenty of space left on the SSD

Did you also check how much disk space your BOINC client is allowed to use?

Do you have unstarted tasks waiting in your buffer?
If not you may have hit a project limit (max #tasks).
In this case you can only solve the issue if you set up additional BOINC clients on that machine.
ID: 38957 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,905,247
RAC: 60,096
Message 38958 - Posted: 24 May 2019, 16:56:18 UTC - in response to Message 38957.  
Last modified: 24 May 2019, 17:00:09 UTC

L3 Cache limitation. Nice joke! ;-D
Wish I knew some computer jokes. The MIP project on WCG admitted to improperly programming the L3 Cache usage. Limited to 5 MB per WU, above that CPU performance drops over 60% for all work. Not that it might change the number of WUs the server DLs.

Did you also check how much disk space your BOINC client is allowed to use?
That computer is using 4.79 GB for LHC with 57.5 GB available to BOINC.

Do you have unstarted tasks waiting in your buffer?
No. I increased my buffer from 0.2/0.2 to 0.5/0.5 and still no waiting WUs.

If not you may have hit a project limit (max #tasks).
In this case you can only solve the issue if you set up additional BOINC clients on that machine.
Bizarre! Why would they set a project limit? Don't they want the work to get done???
I'm not willing to set up additional BOINC clients so I guess I'll move to another project.
Just Allowed New Work for another CPU project & it immediately filled the 17 idle CPU threads.

ID: 38958 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1280
Credit: 8,491,903
RAC: 2,069
Message 38959 - Posted: 24 May 2019, 17:32:10 UTC

Did you set

Max # jobs No limit
Max # CPUs No limit

in your preferences?
ID: 38959 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,905,247
RAC: 60,096
Message 38960 - Posted: 24 May 2019, 17:40:09 UTC - in response to Message 38959.  

Did you set
Max # jobs No limit Yes, no limit
Max # CPUs No limit 1, Yeti says LHC runs less efficiently with multiple CPUs. Says they're not CPUs working on the same WU but different WUs.
in your preferences?

ID: 38960 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1280
Credit: 8,491,903
RAC: 2,069
Message 38961 - Posted: 24 May 2019, 17:56:36 UTC

There is something wrong with the server setting: Max # CPUs 1.
When you set no limit, I think you will fill up your buffer with sixtracks.

See if that works, but disable ATLAS and Theory native for the time being.
If your problem is solved (getting all cores busy), you could use an app_config.xml to manage the cores for Theory and ATLAS.
ID: 38961 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,905,247
RAC: 60,096
Message 38963 - Posted: 24 May 2019, 18:25:14 UTC
Last modified: 24 May 2019, 18:25:29 UTC

I just checked all my computers and for CPUs with 28 or fewer threads it fills up with LHC WUs. For CPUs with 32, 36, 40 & 44 threads it gets fewer than a 28 thread CPU.

I just allowed another CPU project TN-Grid for all big CPUs. I'll suspend them and try the CPU unlimited.

Using an app_config to limit the number of WUs for an application throws the queue out of balance. The server DLs too many of the restricted WUs and maybe not enough of the unrestricted applications. This approach requires babysitting and aborting excess restricted applications. Not the way I like to go.
ID: 38963 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,905,247
RAC: 60,096
Message 38964 - Posted: 24 May 2019, 18:36:50 UTC

Just suspended TN-Grid on those 13 computers and switched preferences to CPU No Limit. Got one or two sixtracts but mostly 2C Theory_native and 12C Atlas_native.
ID: 38964 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,905,247
RAC: 60,096
Message 38965 - Posted: 24 May 2019, 19:48:54 UTC

For CPUs with 32 or more threads if LHC plus another CPU project are allowed then the total number of running WUs is limited. If LHC project is suspended then the non-LHC project starts running all the WUs it rightly should.

So until LHC fixes this problem I'll direct those 13 computers to TN-Grid and turn off LHC.
ID: 38965 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 53,905,247
RAC: 60,096
Message 39006 - Posted: 30 May 2019, 20:46:31 UTC

Sure would be nice if LHC staff would look at this bug and comment. TIA
ID: 39006 · Report as offensive     Reply Quote

Message boards : Number crunching : CPUs left unused???


©2024 CERN