Theory Simulation v263.90 (vbox64_mt_mcore): Big differences in scoring

Author	Message
Robert Klein Send message Joined: 16 Dec 16 Posts: 3 Credit: 27,993,679 RAC: 0	Message 39236 - Posted: 1 Jul 2019, 9:11:09 UTC Hi, there are big differences in the credits for Theory Simulation v263.90 (vbox64_mt_mcore). My AMD 2700X is getting sth. about 0,015 credits per second (cpu time) while for example an 1700 reaches 0,40 credits per second (cpu time). The operating systems are different, but difference of factor 26?!?!? What ist the secret? :D 2700X: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10572024&offset=0&show_names=0&state=4&appid=13 1700: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10389189&offset=0&show_names=0&state=0&appid=13 Thanks and happy crunching! ID: 39236 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,888,149 RAC: 83,706	Message 39237 - Posted: 1 Jul 2019, 10:47:56 UTC - in response to Message 39236. What ist the secret? :D I have brought up this topic in the past, since I also am experiencing huge differences, which are totally unlogical. But so far, no one has been able to provide a comprehensible explanation. ID: 39237 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,888,149 RAC: 83,706	Message 39401 - Posted: 20 Jul 2019, 18:24:17 UTC the big differences in the scoring are also existing with the latest version 263.98 - here an example of two tasks which got finished on 7/19 and 7/20, on the same host, no changes whatsoever made by me: https://lhcathome.cern.ch/lhcathome/result.php?resultid=237289911 runtime: 45,643.06 - CPU time: 44,278.19 - credit points: 1,227.11 https://lhcathome.cern.ch/lhcathome/result.php?resultid=237326272 runtime: 47,549.99 - CPU time: 46,239.37 - credit points: 639.19 can someone enlighten me why the second task delivers only half of the points? Would be very interesting to know, just out of curiosity. ID: 39401 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,857,856 RAC: 2,654	Message 39402 - Posted: 20 Jul 2019, 19:05:54 UTC - in response to Message 39401. Your first mentioned task reports: Device peak FLOPS 11.61 GFLOPS Your second mentioned task with the lower credit reports: Device peak FLOPS 5.81 GFLOPS (the half) From where that difference? The first task: Setting Memory Size for VM. (1030MB), what indicates the task thinks 4 cores are used. The 2nd task: Setting Memory Size for VM. (830MB), what looks like 'only' 2 cores are needed. ID: 39402 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,888,149 RAC: 83,706	Message 39403 - Posted: 20 Jul 2019, 20:35:44 UTC - in response to Message 39402. Crystal Pellet, thanks for the hint regarding Memory Size and number of cores - I didn't notice this (the difference in the peak FLOPS value I saw only after I had written my posting - and I was wondering). The question now is: why is this so? As said above, I didn't make any changes in any settings at all. ID: 39403 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 780 Credit: 59,961,196 RAC: 47,616	Message 39405 - Posted: 20 Jul 2019, 23:25:10 UTC - in response to Message 39403. I think that the Max # CPUs setting in the web preferences is used as a multiplier when server is counting GFLOPS for the host. It doesn't use the number of CPUs the user is actually using (from app_config => std_err). The GFLOPS value is used in the calculations of the granted credits, required memory and in the calculation for the time limit for error 197 EXIT_TIME_LIMIT_EXCEEDED. If you set in your app_config.xml a smaller value for number of CPUs than in the web preferences, you will get higher credits but you are also more likely to get the 197 error if the task is a long one. At least this is what I think is happening, please correct me if I am wrong. ID: 39405 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,888,149 RAC: 83,706	Message 39406 - Posted: 21 Jul 2019, 5:24:06 UTC - in response to Message 39405. At least this is what I think is happening, please correct me if I am wrong. well, the thing is (as mentioned before): I didn't change anything. So, why all of a sudden is there a change in behaviour? ID: 39406 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,888,149 RAC: 83,706	Message 39407 - Posted: 21 Jul 2019, 5:41:23 UTC - in response to Message 39405. Last modified: 21 Jul 2019, 6:08:30 UTC ... but you are also more likely to get the 197 error if the task is a long one. this problem should no longer occur, since Laurence wrote: I have pushed out a new version (263.98) which doubles the lifetime of the VM. This should allow more time for the last job to run. You extended the lifetime (job_duration) to 129600 seconds = 36 hours. That's not the problem! The problem is <rsc_fpops_bound>2000000000000000.000000</rsc_fpops_bound>. Could you tenfold that value? I have just done this. ID: 39407 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 780 Credit: 59,961,196 RAC: 47,616	Message 39409 - Posted: 21 Jul 2019, 8:48:58 UTC - in response to Message 39407. ... but you are also more likely to get the 197 error if the task is a long one. this problem should no longer occur, since Laurence wrote: I have pushed out a new version (263.98) which doubles the lifetime of the VM. This should allow more time for the last job to run. You extended the lifetime (job_duration) to 129600 seconds = 36 hours. That's not the problem! The problem is <rsc_fpops_bound>2000000000000000.000000</rsc_fpops_bound>. Could you tenfold that value? I have just done this. Yes, let's hope that this solves that problem. But the mechanism is still there. ID: 39409 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,888,149 RAC: 83,706	Message 39848 - Posted: 7 Sep 2019, 12:42:54 UTC Last modified: 7 Sep 2019, 12:43:23 UTC I have made the following interesting observation on a notebook with Intel CPU i5 M 480 @ 2.67GHz: - - - - - - - - - - - - - - Total runtime - CPU time - credits Theory task 1-core: 129,937.35 - 129,148.50 - 2,623.52 Theory task 2-core: 129,949.30 - 92,688.76 - 2,623.76 Hence, following questions: - the CPU time for a 2-core should be around double than for 1-core, and not LESS ! - the credit for a 2-core should be around double than for a 1-core, right? What is going wrong here? ID: 39848 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,857,856 RAC: 2,654	Message 39851 - Posted: 7 Sep 2019, 14:55:55 UTC - in response to Message 39848. Hence, following questions: - the CPU time for a 2-core should be around double than for 1-core, and not LESS ! - the credit for a 2-core should be around double than for a 1-core, right? What is going wrong here? It's just calculating the credits as already mostly explained in another Theory thread: New version 263.90 Example post there: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4890&postid=39793 The extra info to add is that not the cpu time is used for the calculation, but the run time. It seems you had set Max # CPUs to 3 in your preferences for that machine, but reduced for one task the number of usable cpu's to 1 and for the other task to 2 by means of app_config.xml. ID: 39851 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,888,149 RAC: 83,706	Message 39852 - Posted: 7 Sep 2019, 18:17:39 UTC - in response to Message 39851. It seems you had set Max # CPUs to 3 in your preferences for that machine, but reduced for one task the number of usable cpu's to 1 and for the other task to 2 by means of app_config.xml. yes, this is exactly what I did. And the results make me wonder. So why should one ever run a 2-core (or even higher-core) task, if the result, in terms of CPU time and of credits, is the same, or even worse? Am I missing something? ID: 39852 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1461 Credit: 9,857,856 RAC: 2,654	Message 39854 - Posted: 7 Sep 2019, 20:06:26 UTC - in response to Message 39852. So why should one ever run a 2-core (or even higher-core) task, if the result, in terms of CPU time and of credits, is the same, or even worse? Am I missing something? The best practice for Theory is to setup single core VM's to process as many jobs for MC Production inside the VM('s). On my 8-threaded machine I run 8 single core VM's (executing cap set to 90% to avoid sluggishness) and I'm using the snapshot mechanism for safety. In fact LHC@home could skip the Theory mt-application imo, except for the very few users with very low RAM. Only they could run more jobs inside a multi-core VM, because their RAM is too low to setup 2 VM's with 730MB RAM each. (The current false server setting 750MB + (750MB * cores) doesn't help here. ID: 39854 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2681 Credit: 286,875,412 RAC: 58,214	Message 39857 - Posted: 8 Sep 2019, 7:38:35 UTC - in response to Message 39854. In fact LHC@home could skip the Theory mt-application ... Right. Theory vbox should be made a singlecore app. At least until BOINC has a much better multicore support. This would avoid typical misconfigurations as well as lots of discussions. ID: 39857 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1908 Credit: 144,888,149 RAC: 83,706	Message 39862 - Posted: 8 Sep 2019, 11:26:45 UTC - in response to Message 39857. Theory vbox should be made a singlecore app. I fully agree. My experience with Theory multicore processing in the recent past (on more than one machine) has shown that it's definitely not working as supposed or as expected. ID: 39862 · Reply Quote

Toby Broom Volunteer moderator Send message Joined: 27 Sep 08 Posts: 880 Credit: 746,650,598 RAC: 323,247	Message 39933 - Posted: 15 Sep 2019, 18:39:00 UTC I have always done this, the only reason I can see to not run single core tasks is to save on ram usage. After the changes to Working Set in the latest version, I tried running some 8core tasks to see if I could push the CPU usage up as each task was assigned 22Gb of ram. When you lookinside the VM it would not even use 8cores it would seem, maybe 1 time I think it did. Running 4cores at the moment and it seems good so the most I could recommend is a 4 core WU. ID: 39933 · Reply Quote

LHC@home