Message boards : Theory Application : Theory Simulation v263.90 (vbox64_mt_mcore): Big differences in scoring
Message board moderation

To post messages, you must log in.

AuthorMessage
Robert Klein

Send message
Joined: 16 Dec 16
Posts: 3
Credit: 24,041,649
RAC: 15,491
Message 39236 - Posted: 1 Jul 2019, 9:11:09 UTC

Hi,
there are big differences in the credits for Theory Simulation v263.90 (vbox64_mt_mcore). My AMD 2700X is getting sth. about 0,015 credits per second (cpu time) while for example an 1700 reaches 0,40 credits per second (cpu time). The operating systems are different, but difference of factor 26?!?!?
What ist the secret? :D

2700X: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10572024&offset=0&show_names=0&state=4&appid=13
1700: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10389189&offset=0&show_names=0&state=0&appid=13

Thanks and happy crunching!
ID: 39236 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1307
Credit: 23,610,386
RAC: 8,734
Message 39237 - Posted: 1 Jul 2019, 10:47:56 UTC - in response to Message 39236.  

What ist the secret? :D
I have brought up this topic in the past, since I also am experiencing huge differences, which are totally unlogical.

But so far, no one has been able to provide a comprehensible explanation.
ID: 39237 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1307
Credit: 23,610,386
RAC: 8,734
Message 39401 - Posted: 20 Jul 2019, 18:24:17 UTC

the big differences in the scoring are also existing with the latest version 263.98 - here an example of two tasks which got finished on 7/19 and 7/20, on the same host, no changes whatsoever made by me:

https://lhcathome.cern.ch/lhcathome/result.php?resultid=237289911
runtime: 45,643.06 - CPU time: 44,278.19 - credit points: 1,227.11

https://lhcathome.cern.ch/lhcathome/result.php?resultid=237326272
runtime: 47,549.99 - CPU time: 46,239.37 - credit points: 639.19

can someone enlighten me why the second task delivers only half of the points?
Would be very interesting to know, just out of curiosity.
ID: 39401 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 979
Credit: 6,382,221
RAC: 437
Message 39402 - Posted: 20 Jul 2019, 19:05:54 UTC - in response to Message 39401.  

Your first mentioned task reports: Device peak FLOPS 11.61 GFLOPS
Your second mentioned task with the lower credit reports: Device peak FLOPS 5.81 GFLOPS (the half)

From where that difference?

The first task: Setting Memory Size for VM. (1030MB), what indicates the task thinks 4 cores are used.
The 2nd task: Setting Memory Size for VM. (830MB), what looks like 'only' 2 cores are needed.
ID: 39402 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1307
Credit: 23,610,386
RAC: 8,734
Message 39403 - Posted: 20 Jul 2019, 20:35:44 UTC - in response to Message 39402.  

Crystal Pellet, thanks for the hint regarding Memory Size and number of cores - I didn't notice this (the difference in the peak FLOPS value I saw only after I had written my posting - and I was wondering).

The question now is: why is this so? As said above, I didn't make any changes in any settings at all.
ID: 39403 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 456
Credit: 24,722,877
RAC: 13,332
Message 39405 - Posted: 20 Jul 2019, 23:25:10 UTC - in response to Message 39403.  

I think that the Max # CPUs setting in the web preferences is used as a multiplier when server is counting GFLOPS for the host. It doesn't use the number of CPUs the user is actually using (from app_config => std_err). The GFLOPS value is used in the calculations of the granted credits, required memory and in the calculation for the time limit for error 197 EXIT_TIME_LIMIT_EXCEEDED. If you set in your app_config.xml a smaller value for number of CPUs than in the web preferences, you will get higher credits but you are also more likely to get the 197 error if the task is a long one.

At least this is what I think is happening, please correct me if I am wrong.
ID: 39405 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1307
Credit: 23,610,386
RAC: 8,734
Message 39406 - Posted: 21 Jul 2019, 5:24:06 UTC - in response to Message 39405.  

At least this is what I think is happening, please correct me if I am wrong.
well, the thing is (as mentioned before): I didn't change anything. So, why all of a sudden is there a change in behaviour?
ID: 39406 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1307
Credit: 23,610,386
RAC: 8,734
Message 39407 - Posted: 21 Jul 2019, 5:41:23 UTC - in response to Message 39405.  
Last modified: 21 Jul 2019, 6:08:30 UTC

... but you are also more likely to get the 197 error if the task is a long one.
this problem should no longer occur, since Laurence wrote:
I have pushed out a new version (263.98) which doubles the lifetime of the VM. This should allow more time for the last job to run.

You extended the lifetime (job_duration) to 129600 seconds = 36 hours. That's not the problem!
The problem is <rsc_fpops_bound>2000000000000000.000000</rsc_fpops_bound>.
Could you tenfold that value?

I have just done this.
ID: 39407 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 456
Credit: 24,722,877
RAC: 13,332
Message 39409 - Posted: 21 Jul 2019, 8:48:58 UTC - in response to Message 39407.  

... but you are also more likely to get the 197 error if the task is a long one.
this problem should no longer occur, since Laurence wrote:
I have pushed out a new version (263.98) which doubles the lifetime of the VM. This should allow more time for the last job to run.

You extended the lifetime (job_duration) to 129600 seconds = 36 hours. That's not the problem!
The problem is <rsc_fpops_bound>2000000000000000.000000</rsc_fpops_bound>.
Could you tenfold that value?

I have just done this.

Yes, let's hope that this solves that problem. But the mechanism is still there.
ID: 39409 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1307
Credit: 23,610,386
RAC: 8,734
Message 39848 - Posted: 7 Sep 2019, 12:42:54 UTC
Last modified: 7 Sep 2019, 12:43:23 UTC

I have made the following interesting observation on a notebook with Intel CPU i5 M 480 @ 2.67GHz:

- - - - - - - - - - - - - - Total runtime - CPU time - credits

Theory task 1-core: 129,937.35 - 129,148.50 - 2,623.52

Theory task 2-core: 129,949.30 - 92,688.76 - 2,623.76

Hence, following questions:
- the CPU time for a 2-core should be around double than for 1-core, and not LESS !
- the credit for a 2-core should be around double than for a 1-core, right?

What is going wrong here?
ID: 39848 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 979
Credit: 6,382,221
RAC: 437
Message 39851 - Posted: 7 Sep 2019, 14:55:55 UTC - in response to Message 39848.  

Hence, following questions:
- the CPU time for a 2-core should be around double than for 1-core, and not LESS !
- the credit for a 2-core should be around double than for a 1-core, right?

What is going wrong here?

It's just calculating the credits as already mostly explained in another Theory thread: New version 263.90

Example post there: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4890&postid=39793

The extra info to add is that not the cpu time is used for the calculation, but the run time.

It seems you had set Max # CPUs to 3 in your preferences for that machine, but reduced for one task the number of usable cpu's to 1 and for the other task to 2 by means of app_config.xml.
ID: 39851 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1307
Credit: 23,610,386
RAC: 8,734
Message 39852 - Posted: 7 Sep 2019, 18:17:39 UTC - in response to Message 39851.  

It seems you had set Max # CPUs to 3 in your preferences for that machine, but reduced for one task the number of usable cpu's to 1 and for the other task to 2 by means of app_config.xml.
yes, this is exactly what I did. And the results make me wonder.
So why should one ever run a 2-core (or even higher-core) task, if the result, in terms of CPU time and of credits, is the same, or even worse? Am I missing something?
ID: 39852 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 979
Credit: 6,382,221
RAC: 437
Message 39854 - Posted: 7 Sep 2019, 20:06:26 UTC - in response to Message 39852.  

So why should one ever run a 2-core (or even higher-core) task, if the result, in terms of CPU time and of credits, is the same, or even worse? Am I missing something?
The best practice for Theory is to setup single core VM's to process as many jobs for MC Production inside the VM('s).
On my 8-threaded machine I run 8 single core VM's (executing cap set to 90% to avoid sluggishness) and I'm using the snapshot mechanism for safety.
In fact LHC@home could skip the Theory mt-application imo, except for the very few users with very low RAM. Only they could run more jobs inside a multi-core VM, because their RAM is too low to setup 2 VM's with 730MB RAM each. (The current false server setting 750MB + (750MB * cores) doesn't help here.
ID: 39854 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1520
Credit: 85,958,362
RAC: 71,532
Message 39857 - Posted: 8 Sep 2019, 7:38:35 UTC - in response to Message 39854.  

In fact LHC@home could skip the Theory mt-application ...

Right.
Theory vbox should be made a singlecore app.
At least until BOINC has a much better multicore support.
This would avoid typical misconfigurations as well as lots of discussions.
ID: 39857 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1307
Credit: 23,610,386
RAC: 8,734
Message 39862 - Posted: 8 Sep 2019, 11:26:45 UTC - in response to Message 39857.  

Theory vbox should be made a singlecore app.
I fully agree.
My experience with Theory multicore processing in the recent past (on more than one machine) has shown that it's definitely not working as supposed or as expected.
ID: 39862 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 598
Credit: 378,146,604
RAC: 31,967
Message 39933 - Posted: 15 Sep 2019, 18:39:00 UTC

I have always done this, the only reason I can see to not run single core tasks is to save on ram usage.

After the changes to Working Set in the latest version, I tried running some 8core tasks to see if I could push the CPU usage up as each task was assigned 22Gb of ram. When you lookinside the VM it would not even use 8cores it would seem, maybe 1 time I think it did. Running 4cores at the moment and it seems good so the most I could recommend is a 4 core WU.
ID: 39933 · Report as offensive     Reply Quote

Message boards : Theory Application : Theory Simulation v263.90 (vbox64_mt_mcore): Big differences in scoring


©2020 CERN