Message boards : Sixtrack Application : Wrong Factor sent by Project Server
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,904,036
RAC: 137,940
Message 40762 - Posted: 3 Dec 2019, 8:09:49 UTC

As mentioned by a couple of other users in different threads SixTrack currently behaves weird.

The project servers sends a couple of <app_version> settings to the client via scheduler reply.
Among them <avg_ncpus> which overwrites the client's corresponding setting from client_state.xml.

Factors 0.15, 0.4 and 0.6 can be found on all of my clients for sse2 as well as for avx apps.
The correct factor for SixTrack as a singlecore app would be 1.0.

As long as a user doesn't have an app_config.xml the wrong setting sent by the server is used.


@ the project team
Be so kind as to check the server templates and ensure the correct factor is sent.
ID: 40762 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 242
Credit: 5,800,306
RAC: 0
Message 40766 - Posted: 3 Dec 2019, 15:59:27 UTC - in response to Message 40762.  

Thanks for reporting, but this is strange. On the server side, we have no related changes to CPU factors in plan classes as far as I can see. And the Sixtrack templates have not been changed since 2018. Our server code was last upgraded in September, and other later tuning has been mainly for Theory.

Since when have you seen this behaviour?

Perhaps our SixTrack colleagues have got further insights.
ID: 40766 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,904,036
RAC: 137,940
Message 40767 - Posted: 3 Dec 2019, 16:23:22 UTC - in response to Message 40766.  

Other users report this for a couple of days.
Hence I checked my clients running (only) SixTrack beside tasks from other projects.

At first I noticed it last weekend but not on all clients.
This is strange as some clients are running on the same host with exactly the same setup just for (disk) load balancing reasons.

Meanwhile all SixTrack clients/tasks are affected with the following exception:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=253719553

Tasks from other projects are not affected.
ID: 40767 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jun 18
Posts: 126
Credit: 52,457,949
RAC: 23,953
Message 40768 - Posted: 3 Dec 2019, 16:53:19 UTC

For a home made fix can we add to our app_configs???
<app_config>
<app>
    <name>ATLAS</name>
    <!-- Xeon E5-2699 v4  22c44t  L3 Cache = 55 MB  -->
    <max_concurrent>6</max_concurrent>
</app>
<app_version>
    <app_name>ATLAS</app_name>
    <plan_class>native_mt</plan_class>
    <avg_ncpus>6</avg_ncpus>
    <cmdline>--nthreads 6</cmdline>
</app_version>
<app>
    <name>sixtrack</name>
    <max_concurrent>38</max_concurrent>
</app>
<app_version>
    <app_name>sixtrack</app_name>
    <plan_class>avx</plan_class>
    <avg_ncpus>1</avg_ncpus>
</app_version>
<app_version>
    <app_name>sixtrack</app_name>
    <plan_class>sse2</plan_class>
    <avg_ncpus>1</avg_ncpus>
</app_version>
</app_config>
And how do we handle the multiple plan_classes???
ID: 40768 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,904,036
RAC: 137,940
Message 40769 - Posted: 3 Dec 2019, 17:07:06 UTC - in response to Message 40768.  

For a home made fix can we add to our app_configs???
.
.
.
And how do we handle the multiple plan_classes???

<app_version>
    <app_name>sixtrack</app_name>
    <plan_class>avx</plan_class>
    <avg_ncpus>1</avg_ncpus>
</app_version>
<app_version>
    <app_name>sixtrack</app_name>
    <plan_class>sse2</plan_class>
    <avg_ncpus>1</avg_ncpus>
</app_version>

Yes.
This is the correct section in app_config.xml to set the cpu factor to "1" for avx as well as for sse2.
ID: 40769 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,086,590
RAC: 104,237
Message 40864 - Posted: 9 Dec 2019, 9:32:44 UTC

CentOS7 VM.
One sixtracktest with 0,3333 CPU and one with 0,6666 CPU for 1 CPU in use.
For me this is ok, don't need a app_config therefore.
ID: 40864 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,659,975
RAC: 0
Message 40911 - Posted: 12 Dec 2019, 8:53:29 UTC - in response to Message 40864.  

Hi,
sorry for the late reply. Recently we had some changes on the SixTrack app, but everything was before uploading work to the BOINC server.
On the server side, the only change was the update of the estimation of the host speed - which is done now for all hosts (no longer only 1) which have provided valid results, for which all particles survive. I do not think that this update can have an affect on the
avg_ncpus
parameter.

Both my Ubuntu hosts (running only sixtrack and sixtracktest) do not have a
app_config.xml
file, and all the places where
avg_ncpus
is set, it is always set to 1.
I can check also a windows machine of mine, but I don't have access right now...
ID: 40911 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,086,590
RAC: 104,237
Message 40912 - Posted: 12 Dec 2019, 9:07:35 UTC

Hi Alessio,
it only shown on this AMD A10 (CentOS7 VM with 1 CPU). Seeing 0.333 and 0.666 CPU for two tasks at the same time.
All tasks finished successful.
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10591881
ID: 40912 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,904,036
RAC: 137,940
Message 40913 - Posted: 12 Dec 2019, 9:45:57 UTC
Last modified: 12 Dec 2019, 10:13:59 UTC

I'm running SixTrack on 7 BOINC instances.
A few days ago I did a project reset on all clients to get rid of the wrong avg_ncpus settings.
I don't use an app_config.xml.

After the project reset all clients started with avg_ncpus=1.0.

Meanwhile the picture looks like this:
client 1: <avg_ncpus>0.100000</avg_ncpus>
client 2: <avg_ncpus>0.400000</avg_ncpus>
client 3: <avg_ncpus>1.000000</avg_ncpus>
client 4: <avg_ncpus>1.000000</avg_ncpus>
client 5: <avg_ncpus>1.000000</avg_ncpus>
client 6: <avg_ncpus>0.150000</avg_ncpus>
client 7: <avg_ncpus>1.000000</avg_ncpus>


The snippets are taken from client_state.xml.
Example:
<app_version>
    <app_name>sixtrack</app_name>
    <version_num>50205</version_num>
    <platform>x86_64-pc-linux-gnu</platform>
    <avg_ncpus>0.100000</avg_ncpus>
    <flops>8664408581.685040</flops>
    <plan_class>avx</plan_class>
    <api_version>7.14.2</api_version>
    <file_ref>
        <file_name>sixtrack_lin64_50205_avx.linux</file_name>
        <main_program/>
    </file_ref>
</app_version>




avg_ncpus could be forced to be 1.0 using an app_config.xml but then I wouldn't see the wrong setting coming from the project server.
All clients are running a couple of non LHC projects and none of the other projects show a similar behavior.

<edit>
A result of the wrong setting is that the local BOINC client prefers other projects over SixTrack.
Hence client 2 downloaded 2 tasks last Monday but didn't start them yet.
</edit>
ID: 40913 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 40914 - Posted: 12 Dec 2019, 10:48:53 UTC - in response to Message 40913.  

A result of the wrong setting is that the local BOINC client prefers other projects over SixTrack.
Hence client 2 downloaded 2 tasks last Monday but didn't start them yet.
That's because of a higher recent estimated credit <REC> of LHC than the other projects for that instance.
Have a search over client_state.xml for </master_url> and </rec>
ID: 40914 · Report as offensive     Reply Quote
Werkstatt

Send message
Joined: 5 Oct 08
Posts: 12
Credit: 1,108,455
RAC: 0
Message 42038 - Posted: 4 Apr 2020, 11:36:45 UTC

Can someone please explain why a wu is using only 0.309 CPU's? This is usually the case with GPU-wu's, but nothing hints to that.
I have no VM installed.

It's thix PC: https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10511049
or in short: AMD Ryzen win10 AMD and Nvidia Graphics cards.
ID: 42038 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 42039 - Posted: 4 Apr 2020, 14:14:43 UTC - in response to Message 42038.  

Can someone please explain why a wu is using only 0.309 CPU's?
It must be an error in a server template. No fix yet. See discussion in earlier posts in this thread.
ID: 42039 · Report as offensive     Reply Quote
ProDigit

Send message
Joined: 8 Nov 19
Posts: 9
Credit: 2,236,919
RAC: 0
Message 42847 - Posted: 12 Jun 2020, 14:53:03 UTC

Hi!
I've noticed on my Ryzen 9 3950x, 216 core 32 threads running at ~3,8Ghz, I got a batch of CPU jobs that show in Boinc MGR (0.137 CPU).
I stopped all other projects, and noticed this value is off, and should be closer to 0.5 to 0.85 CPU.
What are the values I need to modify in app_config.xml?
one of the WU's data:
Application
SixTrack 502.05 (avx)
Name
workspace2_hl14_collision_noBB_DA50cm_B4__21__s__62.31_60.32__2_4__6__3_1_sixvf_boinc16023
State
Running
Received
Fri 12 Jun 2020 08:47:42 AM EDT
Report deadline
Fri 19 Jun 2020 12:19:55 AM EDT
Resources
0.137 CPUs
Estimated computation size
180,000 GFLOPs
CPU time
01:59:52
CPU time since checkpoint
00:00:47
Elapsed time
02:01:14
Estimated time remaining
02:17:47
Fraction done
48.304%
Virtual memory size
83.86 MB
Working set size
48.35 MB
Directory
slots/37
Process ID
8637
Progress rate
24.480% per hour
Executable
sixtrack_lin64_50205_avx.linux


I did:
<app_config>
<app>
<name>sixtrack</name>
<gpu_versions>
<cpu_usage>0.3</cpu_usage>
</gpu_versions>
</app>
</app_config>


But it didn't work.?
ID: 42847 · Report as offensive     Reply Quote
ProDigit

Send message
Joined: 8 Nov 19
Posts: 9
Credit: 2,236,919
RAC: 0
Message 42850 - Posted: 12 Jun 2020, 17:22:00 UTC

So this issue is ongoing for 6 months, and no concrete working answer or solution?
ID: 42850 · Report as offensive     Reply Quote

Message boards : Sixtrack Application : Wrong Factor sent by Project Server


©2024 CERN