Message boards : News : Status, 24th January, 2014
Message board moderation

To post messages, you must log in.

AuthorMessage
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 26183 - Posted: 24 Jan 2014, 12:34:09 UTC

Hope this will answer some of your messages.

We still have some 34,000 WUs NOT being taken. We have apparently
almost 6000 in progress.

We introduced SixTrack Version 4.5.03 on Wednesday 22nd
January after extensive testing on boinctest and at CERN.
Unluckily Yuri flooded us with work at the same time
and AFS blew up leading to a huge backlog of over 16,000
results to be downloaded.

1. Results Validation;seems to be OK. I summarise that,
countimg from 0-59 we do NOT CHECK Words 51, 59? and 60
in fort.10.

The validator log shows many many "cannot open" supposedly
existing results for comparison. They were probably lost
somehow.

2. Assimilation; the log shows
"Herror too many total results" !!!
There are about 2000 (1979) unique messages and cases/WUs.
I suspect we may nedd to clean the database and remove results
(with clients losing credit I am afraid, but they will probably never
get credit for these anyway).
I could delete them from upload but that would probably be worse.

3. Scheduler log: there are about 2.4 million messages of which
there are 1.64M unrecognised messages, multiple messages per WU.
This is perhaps significant!
previously these messages existed only for Macs as far as I can see.
here is one case:
2014-01-22 17:24:41.1073 [PID=51877] HOST::parse(): unrecognized: opencl_cpu_prop
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: platform_vendor
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: Advanced Micro Devices, Inc.
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: /platform_vendor
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: opencl_cpu_info
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: name
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: /name
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: vendor
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: GenuineIntel
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: /vendor
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: vendor_id
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: 4098
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: /vendor_id
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: available
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: 1
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: /available
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: half_fp_config
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: 0
2014-01-22 17:24:41.1076 [PID=51877] HOST::parse(): unrecognized: /half_fp_config
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: single_fp_config
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: 191
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: /single_fp_config
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: double_fp_config
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: 63
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: /double_fp_config
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: endian_little
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: 1
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: /endian_little
2014-01-22 17:24:41.1077 [PID=51877] HOST::parse(): unrecognized: execution_capabilities
2014-01-22 17:24:41.1078 [PID=51877] HOST::parse(): unrecognized: 3
2014-01-22 17:24:41.1078 [PID=51877] HOST::parse(): unrecognized: /execution_capabilities
2014-01-22 17:24:41.1078 [PID=51877] HOST::parse(): unrecognized: extensions
2014-01-22 17:24:41.1078 [PID=51877] HOST::parse(): unrecognized: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_kh
2014-01-22 17:24:41.1078 [PID=51877] HOST::parse(): unrecognized: /extensions
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: global_mem_size
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: 17029206016
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: /global_mem_size
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: local_mem_size
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: 32768
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: /local_mem_size
2014-01-22 17:24:41.1153 [PID=51877] HOST::parse(): unrecognized: max_clock_frequency
2014-01-22 17:24:41.1154 [PID=51877] HOST::parse(): unrecognized: 3500
2014-01-22 17:24:41.1154 [PID=51877] HOST::parse(): unrecognized: /max_clock_frequency
2014-01-22 17:24:41.1154 [PID=51877] HOST::parse(): unrecognized: max_compute_units
2014-01-22 17:24:41.1154 [PID=51877] HOST::parse(): unrecognized: 8
2014-01-22 17:24:41.1154 [PID=51877] HOST::parse(): unrecognized: /max_compute_units
2014-01-22 17:24:41.1154 [PID=51877] HOST::parse(): unrecognized: opencl_platform_version
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: OpenCL 1.2 AMD-APP (1348.5)
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: /opencl_platform_version
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: opencl_device_version
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: OpenCL 1.2 AMD-APP (1348.5)
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: /opencl_device_version
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: opencl_driver_version
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: 1348.5 (sse2,avx)
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: /opencl_driver_version
2014-01-22 17:24:41.1155 [PID=51877] HOST::parse(): unrecognized: /opencl_cpu_info
2014-01-22 17:24:41.1156 [PID=51877] HOST::parse(): unrecognized: /opencl_cpu_prop
2014-01-22 17:24:41.3583 [PID=51877] Request: [USER#221474] [HOST#10137513] [IP 69.35.195.242] client 7.2.33
2014-01-22 17:24:41.3880 [PID=51877] Sending reply to [HOST#10137513]: 0 results, delay req 6.00
2014-01-22 17:24:41.3880 [PID=51877] Scheduler ran 0.035 seconds

I am not an expert but it seems to me it might explain work not being taken.......
(but never saw this with boinctest!).

Other issue; one client reports "Cannot Create Process" mon Windows 7.
May or may not be significant.

Are executables 'signed" OK?

So all a bit complicated but hope to sort it (very) soon.
Eric.
ID: 26183 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 26184 - Posted: 24 Jan 2014, 12:35:59 UTC

P.S. The "unrecognized" messages wre for only Mac systems in the past.
ID: 26184 · Report as offensive     Reply Quote
alvin
Avatar

Send message
Joined: 12 Mar 12
Posts: 128
Credit: 20,013,377
RAC: 0
Message 26186 - Posted: 24 Jan 2014, 13:23:08 UTC
Last modified: 24 Jan 2014, 13:24:40 UTC

take a note on ratio valid:inconclusive:invalid:error
is has significantly shifted recently from
Validation inconclusive (614) · Invalid (63) · Error (197)
to
Validation inconclusive (1036) · Invalid (233) · Error (211)

and now this

In progress (42) · Validation pending (389) · Validation inconclusive (6842) · Valid (3265) · Invalid (1887) · Error (218)
ID: 26186 · Report as offensive     Reply Quote
Profile Kathryn Tombaugh Weber

Send message
Joined: 12 Sep 11
Posts: 38
Credit: 218,154
RAC: 0
Message 26187 - Posted: 24 Jan 2014, 14:01:28 UTC

Have fun. I am standing by. :)
ID: 26187 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 27 Oct 07
Posts: 186
Credit: 3,297,640
RAC: 0
Message 26189 - Posted: 24 Jan 2014, 14:59:16 UTC - in response to Message 26183.  

3. Scheduler log: there are about 2.4 million messages of which
there are 1.64M unrecognised messages, multiple messages per WU.
This is perhaps significant!
previously these messages existed only for Macs as far as I can see.
here is one case:
2014-01-22 17:24:41.1073 [PID=51877] HOST::parse(): unrecognized: opencl_cpu_prop
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: platform_vendor
2014-01-22 17:24:41.1075 [PID=51877] HOST::parse(): unrecognized: Advanced Micro Devices, Inc.
...

That looks like the server is trying, but failing, to parse an <opencl_cpu_prop> block in the user's sched_request file.

OpenCL on CPUs is a very recent addition to BOINC (previously OpenCL was only supported on GPUs), and the user is requesting using BOINC v7.2.33, which is indeed the newest 'recommended' client version - deployed 26 Nov 2013.

Here's what a CPU OpenCL description looks like in context:

<host_info>
<timezone>0</timezone>
<domain_name>BOINC-test</domain_name>
<ip_addr>192.168.173.23</ip_addr>
<host_cpid>91e1b14702c62eccb1697f5b9bbe0ed1</host_cpid>
<p_ncpus>4</p_ncpus>
<p_vendor>GenuineIntel</p_vendor>
<p_model>Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz [Family 6 Model 60 Stepping 3]</p_model>
<p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes syscall nx lm vmx smx tm2 pbe</p_features>
<p_fpops>3520599691.555607</p_fpops>
<p_iops>14311207944.519573</p_iops>
<p_membw>250000000.000000</p_membw>
<p_calculated>1389960945.229666</p_calculated>
<p_vm_extensions_disabled>0</p_vm_extensions_disabled>
<m_nbytes>4173619200.000000</m_nbytes>
<m_cache>262144.000000</m_cache>
<m_swap>8345329664.000000</m_swap>
<d_total>483242541056.000000</d_total>
<d_free>436829421568.000000</d_free>
<os_name>Microsoft Windows 7</os_name>
<os_version>Professional x64 Edition, Service Pack 1, (06.01.7601.00)</os_version>
<opencl_cpu_prop>
<platform_vendor>Intel(R) Corporation</platform_vendor>
<opencl_cpu_info>
<name>Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz</name>
<vendor>Intel(R) Corporation</vendor>
<vendor_id>32902</vendor_id>
<available>1</available>
<half_fp_config>0</half_fp_config>
<single_fp_config>7</single_fp_config>
<double_fp_config>63</double_fp_config>
<endian_little>1</endian_little>
<execution_capabilities>3</execution_capabilities>
<extensions>cl_khr_fp64 cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission cl_intel_exec_by_local_thread cl_khr_gl_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d11_sharing</extensions>
<global_mem_size>4173619200</global_mem_size>
<local_mem_size>32768</local_mem_size>
<max_clock_frequency>3200</max_clock_frequency>
<max_compute_units>4</max_compute_units>
<opencl_platform_version>OpenCL 1.2</opencl_platform_version>
<opencl_device_version>OpenCL 1.2 (Build 63463)</opencl_device_version>
<opencl_driver_version>1.2</opencl_driver_version>
</opencl_cpu_info>
</opencl_cpu_prop>
</host_info>

If you haven't updated the server code since November, it might be wise to do so (though I haven't heard of the new section causing any problems at other projects). Else, you might need to consult David Anderson.
ID: 26189 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 26191 - Posted: 24 Jan 2014, 16:57:08 UTC - in response to Message 26189.  

Thanks a lot Richard. This almost certainly explains
the not getting work issue. My colleagues built
the Windows execs using a very recent BIOINC version
I suppose and our server BOINC is older. Eric
ID: 26191 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 26197 - Posted: 25 Jan 2014, 0:10:50 UTC

We have restored old executables and seem to be back in business.
Need to clean up, clear up and the post mortem news will follow.
Eric.
ID: 26197 · Report as offensive     Reply Quote
FruehwF

Send message
Joined: 21 Aug 12
Posts: 9
Credit: 2,941,516
RAC: 0
Message 26198 - Posted: 26 Jan 2014, 12:39:08 UTC

Eric please note aloso this Thread: http://lhcathomeclassic.cern.ch/sixtrack/forum_thread.php?id=3799
there are problems with Winxp and the new Version.
In the Thread a Solution ist described.

regardes Franz
ID: 26198 · Report as offensive     Reply Quote
Eric Mcintosh
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 12 Jul 11
Posts: 857
Credit: 1,619,050
RAC: 0
Message 26199 - Posted: 27 Jan 2014, 1:06:35 UTC - in response to Message 26198.  

Thanks for that.....many thanks. Don't know how I/we missed it.
The inquest continues. Eric.


ID: 26199 · Report as offensive     Reply Quote

Message boards : News : Status, 24th January, 2014


©2024 CERN