Message boards : ATLAS application : only half of credit for 4-core task - compared to 2-core task
Message board moderation

To post messages, you must log in.

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1681
Credit: 99,381,133
RAC: 111,026
Message 35301 - Posted: 18 May 2018, 18:31:28 UTC

Today I noticed something very strange:

2-core task 191404732 of May 16:
runtime: 74.159 secs; CPU time 146.688 secs; credit points: 1.226

4-core task 191429102 of May 17:
runtime: 37.896 secs; CPU time 144.711 secs; credit points: 605

with almost same CPU time, the 4-core task yields only half of the points than the 2-core task.
Any logical explanation?
ID: 35301 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2375
Credit: 221,688,719
RAC: 142,895
Message 35302 - Posted: 18 May 2018, 18:59:10 UTC - in response to Message 35301.  

BOINC can't reliably detect what causes large runtime differences (could also be caused by an error).
Thus it needs a couple (more or less many couples) of valid WUs to accept the new runtime range as input for the credit calculation.
ID: 35302 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 35305 - Posted: 19 May 2018, 13:59:36 UTC - in response to Message 35302.  

In order to compare the cpu efficiency for Erich's host trying various kinds of work units (1,2,4), i noticed that the cpu efficiency doesn't vary a lot.
Here are the results sorted by ascending cpu efficiency :
number core	run time	cpu time	efficiency
1	47007,97	44610,45	0,949
4	31546,23	119886,4	0,950
4	37895,95	144711,5	0,955
4	44632,33	172579,6	0,967
1	41378,81	40631,08	0,982
2	74844,05	147679,5	0,987
2	72754,72	143628,0	0,987
2	79614,12	157255,7	0,988
2	78678,92	155415,1	0,988
2	64573,41	127711,8	0,989
2	74159,96	146688,2	0,989
2	61593,39	121835,8	0,989
2	73680,94	145783,3	0,989
2	64629,23	127894,8	0,989
2	85418,61	169112,1	0,990
2	67725,97	134096,9	0,990
1	41229,42	40820,41	0,990
1	50321,02	49826,94	0,990
2	87529,05	173571,6	0,992
2	77997,69	154714,0	0,992
2	71624,45	142076,7	0,992
1	39428,77	39120,66	0,992
2	79781,58	158398,6	0,993
1	43358,44	43116,78	0,994
1	86106,36	85862,92	0,997
1	45956,41	45835,34	0,997
1	85784,28	85584,03	0,998
1	49587,36	49474,27	0,998
1	42208,95	42114,27	0,998
1	85663,14	85541,05	0,999
1	97108,11	96985,05	0,999
1	91747,84	91634,41	0,999
1	43893,34	43850,02	0,999
1	46324,81	46289,48	0,999
1	102843,97	102784,2	0,999
1	87262,61	87229,83	1,000
1	42315,73	42325,23	1,000
1	43034,16	43047,48	1,000
1	43995,33	44016,30	1,000
1	50125,30	50162,84	1,001
1	52159,14	52225,02	1,001
1	90593,39	90741,92	1,002
2	63862,58	128394,1	1,005
1	40409,09	41362,59	1,024

If you see the bottom of the results , there are cpu efficiency values greater than 1.(Not realistic at all)...
So i think, the way , the run time and the cpu times values are calculated is not so accurate.
For me , the run time should be checked by taking the time of the last line in the log less the time of the first line in the log , multiplied by the cpu percentage of use (throttle), if the work unit has not been paused or restarted.( with substraction of pauses times multiplied by percentage cpu use in preference for other work units paused or restarted)
It's not so important but it may have a big influence if CERN choices are made following cpu efficiency values.

Questions :
*What is the component which evaluates these values (boinc client , boinc server , virtualbox , other component hidden) ?
*How are calculated these values ?
*Are there other persons who notice the same behavior with their host to make us aware if only erich's host has this type of trouble ,or if it's only windows with virtualbox , and or linux and darwin hosts , too ?
ID: 35305 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2375
Credit: 221,688,719
RAC: 142,895
Message 35306 - Posted: 19 May 2018, 14:53:51 UTC - in response to Message 35305.  

You'r right, if you compare the runtime vs. CPU-time.
The efficiency is rarely lower than 95% or 380% for a 4-core WU (if you multiply it with the number of cores).
This looks very good and is close to the efficiency of hosts that are directly connected to the CERN LAN running ATLAS (native), e.g. that from Agile Boincers.

Now let's have a look at the idle times.
Erich's 1-core setup shows average idle times of roughly 4 minutes.
His 2-core setup 29 minutes (=> core0 idle for 4 minutes, core1 idle for 25 minutes)
His 4-core setup 109 minutes (=> core0 idle for 4 minutes, cores1-3 idle for 35 minutes each)

The scientific output is exactly the same for all 3 setup variants.
The different idle times are caused "by design", not by a an error on Erich's side.


I believe there are 2 situations where a "more core" setup makes sense vs. a setup with concurrently running "less cores":
1. If the host has not enough RAM to run additional WUs
2. If the internet connection is saturated
ID: 35306 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1681
Credit: 99,381,133
RAC: 111,026
Message 35308 - Posted: 19 May 2018, 16:53:26 UTC - in response to Message 35305.  

In order to compare the cpu efficiency for Erich's host trying various kinds of work units (1,2,4), i noticed that the cpu efficiency doesn't vary a lot.
Here are the results sorted by ascending cpu efficiency
thanks, Philippe, for the work and time you put into this comparison. Interesting results.
And it confirms once more: 1-core is most efficient.
ID: 35308 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 35310 - Posted: 19 May 2018, 17:31:32 UTC - in response to Message 35306.  

Thanks to answer me , but to be honest , i have some difficulties to understand where you choose the beginning of the elapsed time and its end , and how you can determine the period the idle core, have inside the VM.
Better is to take an example ,to show clearly how you do :
Here is one Erich's log , i added rank line before to give it easier to understand and speak about.
1	<core_client_version>7.6.22</core_client_version>
2	<![CDATA[
3	<stderr_txt>
4	2018-05-18 14:05:59 (4636): vboxwrapper (7.7.26196): starting
5	2018-05-18 14:05:59 (4636): Feature: Checkpoint interval offset (235 seconds)
6	2018-05-18 14:05:59 (4636): Detected: VirtualBox COM Interface (Version: 5.1.38)
7	2018-05-18 14:05:59 (4636): Detected: Minimum checkpoint interval (900.000000 seconds)
8	2018-05-18 14:05:59 (4636): Successfully copied 'init_data.xml' to the shared directory.
9	2018-05-18 14:05:59 (4636): Create VM. (boinc_cbf3fadcc7133493, slot#5)
10	2018-05-18 14:05:59 (4636): Setting Memory Size for VM. (4800MB)
11	2018-05-18 14:05:59 (4636): Setting CPU Count for VM. (2)
12	2018-05-18 14:05:59 (4636): Setting Chipset Options for VM.
13	2018-05-18 14:05:59 (4636): Setting Boot Options for VM.
14	2018-05-18 14:05:59 (4636): Enabling VM Network Access.
15	2018-05-18 14:05:59 (4636): Setting Network Configuration for NAT.
16	2018-05-18 14:05:59 (4636): Disabling USB Support for VM.
17	2018-05-18 14:05:59 (4636): Disabling COM Port Support for VM.
18	2018-05-18 14:05:59 (4636): Disabling LPT Port Support for VM.
19	2018-05-18 14:05:59 (4636): Disabling Audio Support for VM.
20	2018-05-18 14:05:59 (4636): Disabling Clipboard Support for VM.
21	2018-05-18 14:05:59 (4636): Disabling Drag and Drop Support for VM.
22	2018-05-18 14:05:59 (4636): Adding storage controller(s) to VM.
23	2018-05-18 14:05:59 (4636): Adding virtual disk drive to VM. (vm_image.vdi)
24	2018-05-18 14:06:02 (4636): Adding VirtualBox Guest Additions to VM.
25	2018-05-18 14:06:02 (4636): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
26	2018-05-18 14:06:02 (4636): forwarding host port 2053 to guest port 80
27	2018-05-18 14:06:02 (4636): Enabling remote desktop for VM.
28	2018-05-18 14:06:02 (4636): Enabling shared directory for VM.
29	2018-05-18 14:06:02 (4636): Starting VM. (boinc_cbf3fadcc7133493, slot#5)
30	2018-05-18 14:06:27 (4636): Guest Log: BIOS: VirtualBox 5.1.38
31	2018-05-18 14:06:27 (4636): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63
32	2018-05-18 14:06:27 (4636): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
33	2018-05-18 14:06:27 (4636): Guest Log: BIOS: Booting from Hard Disk...
34	2018-05-18 14:06:27 (4636): Guest Log: BIOS: KBD: unsupported int 16h function 03
35	2018-05-18 14:06:27 (4636): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 
36	2018-05-18 14:06:27 (4636): Successfully started VM. (PID = '4112')
37	2018-05-18 14:06:27 (4636): Reporting VM Process ID to BOINC.
38	2018-05-18 14:06:37 (4636): VM state change detected. (old = 'poweroff', new = 'running')
39	2018-05-18 14:06:47 (4636): Detected: Web Application Enabled (http://localhost:2053)
40	2018-05-18 14:06:47 (4636): Detected: Remote Desktop Enabled (localhost:2054)
41	2018-05-18 14:06:57 (4636): Preference change detected
42	2018-05-18 14:06:57 (4636): Setting CPU throttle for VM. (100%)
43	2018-05-18 14:06:57 (4636): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 900 seconds))
44	2018-05-18 14:07:57 (4636): Guest Log: vboxguest: major 0, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
45	2018-05-18 14:08:07 (4636): Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff88013dff3c10), OR(0x0), NOT(0xffffffff), flags(0x0)
46	2018-05-18 14:08:07 (4636): Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff8801402bf210), OR(0x0), NOT(0xffffffff), flags(0x0)
47	2018-05-18 14:08:07 (4636): Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff88013dff2e10), OR(0x0), NOT(0xffffffff), flags(0x0)
48	2018-05-18 14:08:07 (4636): Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff88013dff2010), OR(0x0), NOT(0xffffffff), flags(0x0)
49	2018-05-18 14:08:47 (4636): Guest Log: Copying input files into RunAtlas.
50	2018-05-18 14:08:57 (4636): Guest Log: Copied input files into RunAtlas.
51	2018-05-18 14:09:07 (4636): Guest Log: copied the webapp to /var/www
52	2018-05-18 14:09:07 (4636): Guest Log: This vm does not need to setup http proxy
53	2018-05-18 14:09:07 (4636): Guest Log: ATHENA_PROC_NUMBER=2
54	2018-05-18 14:09:07 (4636): Guest Log: Starting ATLAS job. (PandaID=3931045702 taskID=14073742)
55	2018-05-18 15:47:06 (4636): Status Report: Elapsed Time: '6009.312498'
56	2018-05-18 15:47:06 (4636): Status Report: CPU Time: '11181.234375'
57	2018-05-18 17:27:06 (4636): Status Report: Elapsed Time: '12009.640625'
58	2018-05-18 17:27:06 (4636): Status Report: CPU Time: '23194.265625'
59	2018-05-18 19:07:07 (4636): Status Report: Elapsed Time: '18010.000000'
60	2018-05-18 19:07:07 (4636): Status Report: CPU Time: '35216.156250'
61	2018-05-18 20:30:27 (4636): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 
62	2018-05-18 20:47:07 (4636): Status Report: Elapsed Time: '24010.609373'
63	2018-05-18 20:47:07 (4636): Status Report: CPU Time: '47249.968750'
64	2018-05-18 22:27:07 (4636): Status Report: Elapsed Time: '30010.921875'
65	2018-05-18 22:27:07 (4636): Status Report: CPU Time: '59291.468750'
66	2018-05-19 00:07:08 (4636): Status Report: Elapsed Time: '36011.187498'
67	2018-05-19 00:07:08 (4636): Status Report: CPU Time: '71325.062500'
68	2018-05-19 01:47:08 (4636): Status Report: Elapsed Time: '42011.671875'
69	2018-05-19 01:47:08 (4636): Status Report: CPU Time: '83361.421875'
70	2018-05-19 03:27:08 (4636): Status Report: Elapsed Time: '48011.937498'
71	2018-05-19 03:27:08 (4636): Status Report: CPU Time: '95396.765625'
72	2018-05-19 05:07:09 (4636): Status Report: Elapsed Time: '54012.218750'
73	2018-05-19 05:07:09 (4636): Status Report: CPU Time: '107428.953125'
74	2018-05-19 06:47:09 (4636): Status Report: Elapsed Time: '60012.781250'
75	2018-05-19 06:47:09 (4636): Status Report: CPU Time: '119463.421875'
76	2018-05-19 08:27:10 (4636): Status Report: Elapsed Time: '66013.062498'
77	2018-05-19 08:27:10 (4636): Status Report: CPU Time: '131505.375000'
78	2018-05-19 09:59:40 (4636): VM Completion File Detected.
79	2018-05-19 09:59:40 (4636): Powering off VM.
80	2018-05-19 09:59:42 (4636): Successfully stopped VM.
81	2018-05-19 09:59:47 (4636): Deregistering VM. (boinc_cbf3fadcc7133493, slot#5)
82	2018-05-19 09:59:47 (4636): Removing virtual disk drive(s) from VM.
83	2018-05-19 09:59:47 (4636): Removing network bandwidth throttle group from VM.
84	2018-05-19 09:59:47 (4636): Removing storage controller(s) from VM.
85	2018-05-19 09:59:47 (4636): Removing VM from VirtualBox.
86	09:59:52 (4636): called boinc_finish(0)
87	</stderr_txt>
88	]]> 

Which is the part of the log which displays the period of the 'elapsed time' (begin and start) ?(Put the range of numbers of the rank line)
Where can you determine the idle period of other cores ?
Does the 'design' mean that the downloading time for initialising the work unit and the uploading time for the result are not included in the run elapsed time , to enable a right comparison with other hosts inside CERN ?
ID: 35310 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2375
Credit: 221,688,719
RAC: 142,895
Message 35311 - Posted: 19 May 2018, 19:26:22 UTC - in response to Message 35310.  

The times are taken from the task pages.
Example:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10388905&offset=0&show_names=0&state=4&appid=

The #cores are taken from the logs linked at the task pages.


Task example1:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=191133132
runtime: 41,229.42
CPU time: 40,820.41
#cores: 1

Efficiency: 99 %
idle time: 409.01 s (-> 6.8 min)



Task example2:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=191456208
runtime: 71,624.45
CPU time: 142,076.70
#cores: 2

Efficiency: 198.4 % (99.2 % per core)
idle time: 1172.2 s (-> 19.5 min)


As a single task may be affected by various influences from outside, an average over a couple of tasks should be taken.
I calculated separate averages for 1-core, 2-core and 4-core tasks.


Thus I noticed that 1-core tasks have an average idle time of roughly 4 minutes (a bit less than in the example above).
This reflects the "basic idle time" that all tasks will show.
What can be observed while a task is running is that it has a startup and a shutdown phase where only 1 core is active.
My guess is that the "basic idle time" does occur at core0 with every setup. The rest of the idle time is equally spread among the rest of the cores.

From example2 above:
basic idle time (average): 4 min (core0)
idle time left for core1: 15.5 min
ID: 35311 · Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 24 Jul 16
Posts: 88
Credit: 239,917
RAC: 0
Message 35313 - Posted: 19 May 2018, 20:33:07 UTC - in response to Message 35311.  

Thanks a lot, computezrmle , to have spent a little of your time , to answer me.
I believe i have understood the method you use , to determine the iddle period of each core (average and equally share among cores ,except core0).
Each day , we learn about each other.
It's not always easy to interpret the results obtained.But I was a bit surprised about the high values recorded for the cpu efficiency whereas some other hosts are lower .
ID: 35313 · Report as offensive     Reply Quote
MBark

Send message
Joined: 22 Dec 05
Posts: 1
Credit: 707,119
RAC: 5
Message 35348 - Posted: 22 May 2018, 22:54:29 UTC

Why aren't you giving credit for "Error while computing"? I don't like wasting my computer time on tasks that have errors in them. You should at least give credit for the computer time used. Max
ID: 35348 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2375
Credit: 221,688,719
RAC: 142,895
Message 35352 - Posted: 23 May 2018, 6:06:13 UTC - in response to Message 35348.  

Hi Mark,

as far as I can see you ran only SixTrack tasks.
This thread covers a problem that is very special to ATLAS, so the experts that may be able to answer your question may not notice it.

Be so kind as to report your problem in the SixTrack section of this MB.

In general:
There are as many reasons for errors as stars in the sky.
Some of them are rewarded, others not.
ID: 35352 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2066
Credit: 155,470,867
RAC: 166,054
Message 36195 - Posted: 2 Aug 2018, 4:18:05 UTC
Last modified: 2 Aug 2018, 4:37:51 UTC

At the moment this Computer get 30 instead of 500 Cobblestones for the same work all the time before!
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10548292
2018-08-02 05:58:47 (5100): Guest Log: HITS file was successfully produced
2018-08-02 05:58:47 (5100): Guest Log: -rw------- 1 atlas01 atlas01 162648366 Aug 2 05:55 /home/atlas01/RunAtlas/HITS.14661314._005788.pool.root.1
Virtualbox 5.2.14 with Boinc 7.10.2.
ID: 36195 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2375
Credit: 221,688,719
RAC: 142,895
Message 36196 - Posted: 2 Aug 2018, 4:39:06 UTC - in response to Message 36195.  

The credits aren't lost.
They are used to make other users feel happy:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=203429685

;-)
ID: 36196 · Report as offensive     Reply Quote

Message boards : ATLAS application : only half of credit for 4-core task - compared to 2-core task


©2024 CERN