Message boards :
ATLAS application :
only half of credit for 4-core task - compared to 2-core task
Message board moderation
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1810 Credit: 118,264,470 RAC: 27,088 |
Today I noticed something very strange: 2-core task 191404732 of May 16: runtime: 74.159 secs; CPU time 146.688 secs; credit points: 1.226 4-core task 191429102 of May 17: runtime: 37.896 secs; CPU time 144.711 secs; credit points: 605 with almost same CPU time, the 4-core task yields only half of the points than the 2-core task. Any logical explanation? |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 56,522 |
BOINC can't reliably detect what causes large runtime differences (could also be caused by an error). Thus it needs a couple (more or less many couples) of valid WUs to accept the new runtime range as input for the credit calculation. |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
In order to compare the cpu efficiency for Erich's host trying various kinds of work units (1,2,4), i noticed that the cpu efficiency doesn't vary a lot. Here are the results sorted by ascending cpu efficiency : number core run time cpu time efficiency 1 47007,97 44610,45 0,949 4 31546,23 119886,4 0,950 4 37895,95 144711,5 0,955 4 44632,33 172579,6 0,967 1 41378,81 40631,08 0,982 2 74844,05 147679,5 0,987 2 72754,72 143628,0 0,987 2 79614,12 157255,7 0,988 2 78678,92 155415,1 0,988 2 64573,41 127711,8 0,989 2 74159,96 146688,2 0,989 2 61593,39 121835,8 0,989 2 73680,94 145783,3 0,989 2 64629,23 127894,8 0,989 2 85418,61 169112,1 0,990 2 67725,97 134096,9 0,990 1 41229,42 40820,41 0,990 1 50321,02 49826,94 0,990 2 87529,05 173571,6 0,992 2 77997,69 154714,0 0,992 2 71624,45 142076,7 0,992 1 39428,77 39120,66 0,992 2 79781,58 158398,6 0,993 1 43358,44 43116,78 0,994 1 86106,36 85862,92 0,997 1 45956,41 45835,34 0,997 1 85784,28 85584,03 0,998 1 49587,36 49474,27 0,998 1 42208,95 42114,27 0,998 1 85663,14 85541,05 0,999 1 97108,11 96985,05 0,999 1 91747,84 91634,41 0,999 1 43893,34 43850,02 0,999 1 46324,81 46289,48 0,999 1 102843,97 102784,2 0,999 1 87262,61 87229,83 1,000 1 42315,73 42325,23 1,000 1 43034,16 43047,48 1,000 1 43995,33 44016,30 1,000 1 50125,30 50162,84 1,001 1 52159,14 52225,02 1,001 1 90593,39 90741,92 1,002 2 63862,58 128394,1 1,005 1 40409,09 41362,59 1,024 If you see the bottom of the results , there are cpu efficiency values greater than 1.(Not realistic at all)... So i think, the way , the run time and the cpu times values are calculated is not so accurate. For me , the run time should be checked by taking the time of the last line in the log less the time of the first line in the log , multiplied by the cpu percentage of use (throttle), if the work unit has not been paused or restarted.( with substraction of pauses times multiplied by percentage cpu use in preference for other work units paused or restarted) It's not so important but it may have a big influence if CERN choices are made following cpu efficiency values. Questions : *What is the component which evaluates these values (boinc client , boinc server , virtualbox , other component hidden) ? *How are calculated these values ? *Are there other persons who notice the same behavior with their host to make us aware if only erich's host has this type of trouble ,or if it's only windows with virtualbox , and or linux and darwin hosts , too ? |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 56,522 |
You'r right, if you compare the runtime vs. CPU-time. The efficiency is rarely lower than 95% or 380% for a 4-core WU (if you multiply it with the number of cores). This looks very good and is close to the efficiency of hosts that are directly connected to the CERN LAN running ATLAS (native), e.g. that from Agile Boincers. Now let's have a look at the idle times. Erich's 1-core setup shows average idle times of roughly 4 minutes. His 2-core setup 29 minutes (=> core0 idle for 4 minutes, core1 idle for 25 minutes) His 4-core setup 109 minutes (=> core0 idle for 4 minutes, cores1-3 idle for 35 minutes each) The scientific output is exactly the same for all 3 setup variants. The different idle times are caused "by design", not by a an error on Erich's side. I believe there are 2 situations where a "more core" setup makes sense vs. a setup with concurrently running "less cores": 1. If the host has not enough RAM to run additional WUs 2. If the internet connection is saturated |
Send message Joined: 18 Dec 15 Posts: 1810 Credit: 118,264,470 RAC: 27,088 |
In order to compare the cpu efficiency for Erich's host trying various kinds of work units (1,2,4), i noticed that the cpu efficiency doesn't vary a lot.thanks, Philippe, for the work and time you put into this comparison. Interesting results. And it confirms once more: 1-core is most efficient. |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
Thanks to answer me , but to be honest , i have some difficulties to understand where you choose the beginning of the elapsed time and its end , and how you can determine the period the idle core, have inside the VM. Better is to take an example ,to show clearly how you do : Here is one Erich's log , i added rank line before to give it easier to understand and speak about. 1 <core_client_version>7.6.22</core_client_version> 2 <![CDATA[ 3 <stderr_txt> 4 2018-05-18 14:05:59 (4636): vboxwrapper (7.7.26196): starting 5 2018-05-18 14:05:59 (4636): Feature: Checkpoint interval offset (235 seconds) 6 2018-05-18 14:05:59 (4636): Detected: VirtualBox COM Interface (Version: 5.1.38) 7 2018-05-18 14:05:59 (4636): Detected: Minimum checkpoint interval (900.000000 seconds) 8 2018-05-18 14:05:59 (4636): Successfully copied 'init_data.xml' to the shared directory. 9 2018-05-18 14:05:59 (4636): Create VM. (boinc_cbf3fadcc7133493, slot#5) 10 2018-05-18 14:05:59 (4636): Setting Memory Size for VM. (4800MB) 11 2018-05-18 14:05:59 (4636): Setting CPU Count for VM. (2) 12 2018-05-18 14:05:59 (4636): Setting Chipset Options for VM. 13 2018-05-18 14:05:59 (4636): Setting Boot Options for VM. 14 2018-05-18 14:05:59 (4636): Enabling VM Network Access. 15 2018-05-18 14:05:59 (4636): Setting Network Configuration for NAT. 16 2018-05-18 14:05:59 (4636): Disabling USB Support for VM. 17 2018-05-18 14:05:59 (4636): Disabling COM Port Support for VM. 18 2018-05-18 14:05:59 (4636): Disabling LPT Port Support for VM. 19 2018-05-18 14:05:59 (4636): Disabling Audio Support for VM. 20 2018-05-18 14:05:59 (4636): Disabling Clipboard Support for VM. 21 2018-05-18 14:05:59 (4636): Disabling Drag and Drop Support for VM. 22 2018-05-18 14:05:59 (4636): Adding storage controller(s) to VM. 23 2018-05-18 14:05:59 (4636): Adding virtual disk drive to VM. (vm_image.vdi) 24 2018-05-18 14:06:02 (4636): Adding VirtualBox Guest Additions to VM. 25 2018-05-18 14:06:02 (4636): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 26 2018-05-18 14:06:02 (4636): forwarding host port 2053 to guest port 80 27 2018-05-18 14:06:02 (4636): Enabling remote desktop for VM. 28 2018-05-18 14:06:02 (4636): Enabling shared directory for VM. 29 2018-05-18 14:06:02 (4636): Starting VM. (boinc_cbf3fadcc7133493, slot#5) 30 2018-05-18 14:06:27 (4636): Guest Log: BIOS: VirtualBox 5.1.38 31 2018-05-18 14:06:27 (4636): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63 32 2018-05-18 14:06:27 (4636): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 33 2018-05-18 14:06:27 (4636): Guest Log: BIOS: Booting from Hard Disk... 34 2018-05-18 14:06:27 (4636): Guest Log: BIOS: KBD: unsupported int 16h function 03 35 2018-05-18 14:06:27 (4636): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 36 2018-05-18 14:06:27 (4636): Successfully started VM. (PID = '4112') 37 2018-05-18 14:06:27 (4636): Reporting VM Process ID to BOINC. 38 2018-05-18 14:06:37 (4636): VM state change detected. (old = 'poweroff', new = 'running') 39 2018-05-18 14:06:47 (4636): Detected: Web Application Enabled (http://localhost:2053) 40 2018-05-18 14:06:47 (4636): Detected: Remote Desktop Enabled (localhost:2054) 41 2018-05-18 14:06:57 (4636): Preference change detected 42 2018-05-18 14:06:57 (4636): Setting CPU throttle for VM. (100%) 43 2018-05-18 14:06:57 (4636): Setting checkpoint interval to 900 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 900 seconds)) 44 2018-05-18 14:07:57 (4636): Guest Log: vboxguest: major 0, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 45 2018-05-18 14:08:07 (4636): Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff88013dff3c10), OR(0x0), NOT(0xffffffff), flags(0x0) 46 2018-05-18 14:08:07 (4636): Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff8801402bf210), OR(0x0), NOT(0xffffffff), flags(0x0) 47 2018-05-18 14:08:07 (4636): Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff88013dff2e10), OR(0x0), NOT(0xffffffff), flags(0x0) 48 2018-05-18 14:08:07 (4636): Guest Log: VBoxGuest: VBoxGuestCommonGuestCapsAcquire: pSession(0xffff88013dff2010), OR(0x0), NOT(0xffffffff), flags(0x0) 49 2018-05-18 14:08:47 (4636): Guest Log: Copying input files into RunAtlas. 50 2018-05-18 14:08:57 (4636): Guest Log: Copied input files into RunAtlas. 51 2018-05-18 14:09:07 (4636): Guest Log: copied the webapp to /var/www 52 2018-05-18 14:09:07 (4636): Guest Log: This vm does not need to setup http proxy 53 2018-05-18 14:09:07 (4636): Guest Log: ATHENA_PROC_NUMBER=2 54 2018-05-18 14:09:07 (4636): Guest Log: Starting ATLAS job. (PandaID=3931045702 taskID=14073742) 55 2018-05-18 15:47:06 (4636): Status Report: Elapsed Time: '6009.312498' 56 2018-05-18 15:47:06 (4636): Status Report: CPU Time: '11181.234375' 57 2018-05-18 17:27:06 (4636): Status Report: Elapsed Time: '12009.640625' 58 2018-05-18 17:27:06 (4636): Status Report: CPU Time: '23194.265625' 59 2018-05-18 19:07:07 (4636): Status Report: Elapsed Time: '18010.000000' 60 2018-05-18 19:07:07 (4636): Status Report: CPU Time: '35216.156250' 61 2018-05-18 20:30:27 (4636): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 62 2018-05-18 20:47:07 (4636): Status Report: Elapsed Time: '24010.609373' 63 2018-05-18 20:47:07 (4636): Status Report: CPU Time: '47249.968750' 64 2018-05-18 22:27:07 (4636): Status Report: Elapsed Time: '30010.921875' 65 2018-05-18 22:27:07 (4636): Status Report: CPU Time: '59291.468750' 66 2018-05-19 00:07:08 (4636): Status Report: Elapsed Time: '36011.187498' 67 2018-05-19 00:07:08 (4636): Status Report: CPU Time: '71325.062500' 68 2018-05-19 01:47:08 (4636): Status Report: Elapsed Time: '42011.671875' 69 2018-05-19 01:47:08 (4636): Status Report: CPU Time: '83361.421875' 70 2018-05-19 03:27:08 (4636): Status Report: Elapsed Time: '48011.937498' 71 2018-05-19 03:27:08 (4636): Status Report: CPU Time: '95396.765625' 72 2018-05-19 05:07:09 (4636): Status Report: Elapsed Time: '54012.218750' 73 2018-05-19 05:07:09 (4636): Status Report: CPU Time: '107428.953125' 74 2018-05-19 06:47:09 (4636): Status Report: Elapsed Time: '60012.781250' 75 2018-05-19 06:47:09 (4636): Status Report: CPU Time: '119463.421875' 76 2018-05-19 08:27:10 (4636): Status Report: Elapsed Time: '66013.062498' 77 2018-05-19 08:27:10 (4636): Status Report: CPU Time: '131505.375000' 78 2018-05-19 09:59:40 (4636): VM Completion File Detected. 79 2018-05-19 09:59:40 (4636): Powering off VM. 80 2018-05-19 09:59:42 (4636): Successfully stopped VM. 81 2018-05-19 09:59:47 (4636): Deregistering VM. (boinc_cbf3fadcc7133493, slot#5) 82 2018-05-19 09:59:47 (4636): Removing virtual disk drive(s) from VM. 83 2018-05-19 09:59:47 (4636): Removing network bandwidth throttle group from VM. 84 2018-05-19 09:59:47 (4636): Removing storage controller(s) from VM. 85 2018-05-19 09:59:47 (4636): Removing VM from VirtualBox. 86 09:59:52 (4636): called boinc_finish(0) 87 </stderr_txt> 88 ]]> Which is the part of the log which displays the period of the 'elapsed time' (begin and start) ?(Put the range of numbers of the rank line) Where can you determine the idle period of other cores ? Does the 'design' mean that the downloading time for initialising the work unit and the uploading time for the result are not included in the run elapsed time , to enable a right comparison with other hosts inside CERN ? |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 56,522 |
The times are taken from the task pages. Example: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10388905&offset=0&show_names=0&state=4&appid= The #cores are taken from the logs linked at the task pages. Task example1: https://lhcathome.cern.ch/lhcathome/result.php?resultid=191133132 runtime: 41,229.42 CPU time: 40,820.41 #cores: 1 Efficiency: 99 % idle time: 409.01 s (-> 6.8 min) Task example2: https://lhcathome.cern.ch/lhcathome/result.php?resultid=191456208 runtime: 71,624.45 CPU time: 142,076.70 #cores: 2 Efficiency: 198.4 % (99.2 % per core) idle time: 1172.2 s (-> 19.5 min) As a single task may be affected by various influences from outside, an average over a couple of tasks should be taken. I calculated separate averages for 1-core, 2-core and 4-core tasks. Thus I noticed that 1-core tasks have an average idle time of roughly 4 minutes (a bit less than in the example above). This reflects the "basic idle time" that all tasks will show. What can be observed while a task is running is that it has a startup and a shutdown phase where only 1 core is active. My guess is that the "basic idle time" does occur at core0 with every setup. The rest of the idle time is equally spread among the rest of the cores. From example2 above: basic idle time (average): 4 min (core0) idle time left for core1: 15.5 min |
Send message Joined: 24 Jul 16 Posts: 88 Credit: 239,917 RAC: 0 |
Thanks a lot, computezrmle , to have spent a little of your time , to answer me. I believe i have understood the method you use , to determine the iddle period of each core (average and equally share among cores ,except core0). Each day , we learn about each other. It's not always easy to interpret the results obtained.But I was a bit surprised about the high values recorded for the cpu efficiency whereas some other hosts are lower . |
Send message Joined: 22 Dec 05 Posts: 1 Credit: 707,119 RAC: 0 |
Why aren't you giving credit for "Error while computing"? I don't like wasting my computer time on tasks that have errors in them. You should at least give credit for the computer time used. Max |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 56,522 |
Hi Mark, as far as I can see you ran only SixTrack tasks. This thread covers a problem that is very special to ATLAS, so the experts that may be able to answer your question may not notice it. Be so kind as to report your problem in the SixTrack section of this MB. In general: There are as many reasons for errors as stars in the sky. Some of them are rewarded, others not. |
Send message Joined: 2 May 07 Posts: 2240 Credit: 173,894,884 RAC: 3,092 |
At the moment this Computer get 30 instead of 500 Cobblestones for the same work all the time before! https://lhcathome.cern.ch/lhcathome/results.php?hostid=10548292 2018-08-02 05:58:47 (5100): Guest Log: HITS file was successfully produced 2018-08-02 05:58:47 (5100): Guest Log: -rw------- 1 atlas01 atlas01 162648366 Aug 2 05:55 /home/atlas01/RunAtlas/HITS.14661314._005788.pool.root.1 Virtualbox 5.2.14 with Boinc 7.10.2. |
Send message Joined: 15 Jun 08 Posts: 2528 Credit: 253,722,201 RAC: 56,522 |
The credits aren't lost. They are used to make other users feel happy: https://lhcathome.cern.ch/lhcathome/result.php?resultid=203429685 ;-) |
©2024 CERN