Message boards : Xtrack/SixTrack : SixTrack on anonymous platform: getting error 500 from scheduler
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 21 Mar 15 Posts: 2 Credit: 19,867,392 RAC: 0 |
Hi all! I have a specific hardware platform with WLIV Elbrus CPU onboard, and I managed to build SixTrack for it with both SSE2 and AVX modes (yes, Elbrus has its support for both of these SIMD extensions). Neither of built apps are runnable though. I did it the following way: - Cloned SixTrack git repo, switched to version 5.4.3 commit - In CMakeLists.txt, added "${CMAKE_SYSTEM_PROCESSOR} MATCHES e2k" check to the places needed - Initialized and updated BOINC subproject, built boinc_api_fortran.o and libraries inside this subproject - Built sixtrack applications using "./cmake_six CR BOINC NATIVE" and "./cmake_six CR BOINC NATIVE AVX" - Installed BOINC, connected to my BAM! account, attached this host to LHC@home, got expected (at this moment) message "[LHC@home] This project doesn't support computers of type e2k-mcst-linux-gnu" - Shut down BOINC, put one of the applications and corresponding app_info.xml inside "projects/lhcathome.cern.ch_lhcathome" directory in BOINC data directory - Ran BOINC again, tried to update project - Got "[LHC@home] Scheduler request failed: HTTP internal server error" that I would not expect. Resetting project does not help. Neither do detaching, re-creating a project directory (with app_info and binaries), and then attaching to project again. Tested it on two hosts, one behind proxy, and one directly connected to internet, but to no avail. Milkyway@Home and Einstein@home works on these hosts perfectly nice using the similar way, though. My app_info.xml (tried many of them, not a single one was ok): <app_info>
<app>
<name>sixtrack</name>
<user_friendly_name>SixTrack</user_friendly_name>
<non_cpu_intensive>0</non_cpu_intensive>
</app>
<file>
<name>sixtrack_avx</name>
<status>1</status>
<executable/>
</file>
<app_version>
<app_name>sixtrack</app_name>
<version_num>50205</version_num>
<platform>anonymous</platform>
<avg_ncpus>1.000000</avg_ncpus>
<flops>100000000.000000</flops>
<plan_class>avx</plan_class>
<api_version>7.14.2</api_version>
<file_ref>
<file_name>sixtrack_avx</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>Most of contents is guessed by client_state.xml from a host where it's everything ok with SixTrack. I tried both avx and sse2 binaries and plan classes, or either one of them; I tried setting api_version to actual version of BOINC, or removing it; I tried to remove and/or alter other fields, like avg_ncpus, flops, platform (in it, I tried anonymous, e2k-mcst-linux-gnu, e2k-linux-gnu, or no such field at all), version_num, status, non_cpu_intensive, etc. My sched_request_lhcathome.cern.ch_lhcathome.xml (authenticator is removed for security): <scheduler_request>
<authenticator>....</authenticator>
<hostid>10674216</hostid>
<rpc_seqno>6</rpc_seqno>
<core_client_major_version>7</core_client_major_version>
<core_client_minor_version>16</core_client_minor_version>
<core_client_release>14</core_client_release>
<resource_share_fraction>1.000000</resource_share_fraction>
<rrs_fraction>1.000000</rrs_fraction>
<prrs_fraction>1.000000</prrs_fraction>
<duration_correction_factor>1.000000</duration_correction_factor>
<allow_multiple_clients>0</allow_multiple_clients>
<sandbox>0</sandbox>
<dont_send_work>0</dont_send_work>
<work_req_seconds>1658880.000000</work_req_seconds>
<cpu_req_secs>1658880.000000</cpu_req_secs>
<cpu_req_instances>32.000000</cpu_req_instances>
<estimated_delay>0.000000</estimated_delay>
<client_cap_plan_class>1</client_cap_plan_class>
<platform_name>anonymous</platform_name>
<working_global_preferences>
<global_preferences>
<source_project></source_project>
<mod_time>0.000000</mod_time>
<battery_charge_min_pct>90.000000</battery_charge_min_pct>
<battery_max_temperature>40.000000</battery_max_temperature>
<run_on_batteries>0</run_on_batteries>
<run_if_user_active>1</run_if_user_active>
<run_gpu_if_user_active>0</run_gpu_if_user_active>
<suspend_if_no_recent_input>0.000000</suspend_if_no_recent_input>
<suspend_cpu_usage>25.000000</suspend_cpu_usage>
<start_hour>0.000000</start_hour>
<end_hour>0.000000</end_hour>
<net_start_hour>0.000000</net_start_hour>
<net_end_hour>0.000000</net_end_hour>
<leave_apps_in_memory>0</leave_apps_in_memory>
<confirm_before_connecting>1</confirm_before_connecting>
<hangup_if_dialed>0</hangup_if_dialed>
<dont_verify_images>0</dont_verify_images>
<work_buf_min_days>0.100000</work_buf_min_days>
<work_buf_additional_days>0.500000</work_buf_additional_days>
<max_ncpus_pct>0.000000</max_ncpus_pct>
<cpu_scheduling_period_minutes>60.000000</cpu_scheduling_period_minutes>
<disk_interval>60.000000</disk_interval>
<disk_max_used_gb>0.000000</disk_max_used_gb>
<disk_max_used_pct>90.000000</disk_max_used_pct>
<disk_min_free_gb>0.100000</disk_min_free_gb>
<vm_max_used_pct>75.000000</vm_max_used_pct>
<ram_max_used_busy_pct>50.000000</ram_max_used_busy_pct>
<ram_max_used_idle_pct>90.000000</ram_max_used_idle_pct>
<idle_time_to_run>3.000000</idle_time_to_run>
<max_bytes_sec_up>0.000000</max_bytes_sec_up>
<max_bytes_sec_down>0.000000</max_bytes_sec_down>
<cpu_usage_limit>100.000000</cpu_usage_limit>
<daily_xfer_limit_mb>0.000000</daily_xfer_limit_mb>
<daily_xfer_period_days>0</daily_xfer_period_days>
<override_file_present>0</override_file_present>
<network_wifi_only>0</network_wifi_only>
</global_preferences>
</working_global_preferences>
<cross_project_id>a8f1beff410131bb325a4041577b4c90</cross_project_id>
<time_stats>
<on_frac>0.999549</on_frac>
<connected_frac>-1.000000</connected_frac>
<cpu_and_network_available_frac>1.000000</cpu_and_network_available_frac>
<active_frac>1.000000</active_frac>
<gpu_active_frac>1.000000</gpu_active_frac>
<client_start_time>1607201147.328425</client_start_time>
<total_start_time>1607137250.782878</total_start_time>
<total_duration>859.837951</total_duration>
<total_active_duration>859.837951</total_active_duration>
<total_gpu_active_duration>859.837951</total_gpu_active_duration>
<now>1607201150.834698</now>
<previous_uptime>175.987093</previous_uptime>
<session_active_duration>0.000000</session_active_duration>
<session_gpu_active_duration>0.000000</session_gpu_active_duration>
</time_stats>
<net_stats>
<bwup>18398.141947</bwup>
<avg_up>470059541.941634</avg_up>
<avg_time_up>1607138339.565820</avg_time_up>
<bwdown>4639723.433021</bwdown>
<avg_down>520727181988.380493</avg_down>
<avg_time_down>1607138339.478808</avg_time_down>
</net_stats>
<host_info>
<timezone>10800</timezone>
<domain_name>mamizou</domain_name>
<ip_addr>192.168.0.153</ip_addr>
<host_cpid>dee8b86df3d72f6b3c1875514a723de7</host_cpid>
<p_ncpus>32</p_ncpus>
<p_vendor>E8C</p_vendor>
<p_model>E8C [Family 4 Model 7 ]</p_model>
<p_features></p_features>
<p_fpops>1000000000.000000</p_fpops>
<p_iops>1000000000.000000</p_iops>
<p_membw>1000000000.000000</p_membw>
<p_calculated>1607137089.123090</p_calculated>
<p_vm_extensions_disabled>0</p_vm_extensions_disabled>
<m_nbytes>236458024960.000000</m_nbytes>
<m_cache>-1.000000</m_cache>
<m_swap>0.000000</m_swap>
<d_total>983312404480.000000</d_total>
<d_free>796927619072.000000</d_free>
<os_name>Linux Debian</os_name>
<os_version>Debian GNU/Linux [5.4.0-1.9-e8c|libc 2.29 (GNU libc)]</os_version>
<n_usable_coprocs>0</n_usable_coprocs>
<wsl_available>0</wsl_available>
</host_info>
<disk_usage>
<d_boinc_used_total>45367296.000000</d_boinc_used_total>
<d_boinc_used_project>45162496.000000</d_boinc_used_project>
<d_project_share>876552173404.160034</d_project_share>
</disk_usage>
<app_versions>
<app_version>
<app_name>sixtrack</app_name>
<version_num>50205</version_num>
<platform>anonymous</platform>
<avg_ncpus>1.000000</avg_ncpus>
<flops>100000000.000000</flops>
<plan_class>avx</plan_class>
<api_version>7.14.2</api_version>
</app_version>
</app_versions>
<other_results>
</other_results>
<in_progress_results>
</in_progress_results>
</scheduler_request>My sched_reply_lhcathome.cern.ch_lhcathome.xml: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>500 Internal Server Error</title> </head><body> <h1>Internal Server Error</h1> <p>The server encountered an internal error or misconfiguration and was unable to complete your request.</p> <p>Please contact the server administrator at. boinc-server-admin@cern.ch to inform them of the time this error occurred, and the actions you performed just before this error.</p> <p>More information about this error may be available in the server error log.</p> </body></html> May please anyone help me on getting SixTrack to work on anonymous platform? |
|
Send message Joined: 15 Jun 08 Posts: 2724 Credit: 299,002,782 RAC: 71,016 |
c description of the anonymous platform can be found here: https://boinc.berkeley.edu/wiki/Anonymous_platform This page states: "... the <platform> element should be removed." The tag names are slightly different to your app_info.xml. I would suggest to rename the executable to "sixtrack_lin64_50205_avx.linux" and use this app_info.xml: [pre]<app_info> <app> <name>sixtrack</name> </app> <file_info> <name>sixtrack_lin64_50205_avx.linux</name> <executable/> </file_info> <app_version> <app_name>sixtrack</app_name> <version_num>50205</version_num> <api_version>7.14.2</api_version> <plan_class>avx</plan_class> <flops>100000000.000000</flops> <avg_ncpus>1.0</avg_ncpus> <file_ref> <file_name>sixtrack_lin64_50205_avx.linux</file_name> <main_program/> </file_ref> </app_version> </app_info>[/pre] The BOINC client must be restarted to activate the changes. In addition your scheduler request has empty processor features: [pre]<p_features></p_features>[/pre] Don't know if at least "avx" must be send to the server. |
|
Send message Joined: 21 Mar 15 Posts: 2 Credit: 19,867,392 RAC: 0 |
Thanks for your suggestions! Unfortunately, neither of them worked. "... the <platform> element should be removed."Yes, I tried removing platform too, as well as specifying it anonymous, e2k-mcst-linux-gnu, e2k-linux-gnu, and, for now, even x86_64-pc-linux-gnu, to get the exact same app_version block, as it is on x86_64 machine (I also messed a bit with number of zeros in avg_ncpus, which does always end up in six zeros in request, and setting flops to 1634109463.151072): <app_version>
<app_name>sixtrack</app_name>
<version_num>50205</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<avg_ncpus>1.000000</avg_ncpus>
<flops>1634109463.151072</flops>
<plan_class>sse2</plan_class>
<api_version>7.14.2</api_version>
</app_version>But, on x86_64 everything with such app_version block goes fine, and here it does not.The tag names are slightly different to your app_info.xml.Well, you're right, probably that's a mistake, but still nothing changed, when I fixed <file>...</file> to <file_info>...</file_info>. I would suggest to rename the executable to "sixtrack_lin64_50205_avx.linux" and use this app_info.xmlI tried, and got the exact same result. I tried the same with the sse2 build too, but still have no success. In addition your scheduler request has empty processor featuresGood idea, but it stil didn't help. I tried patching BOINC in client/hostinfo_unix.cpp:709 like this: #if defined(__e2k__)
safe_strcpy(features, "sse sse2 ssse3 sse4_1 sse4_2 sse4a avx avx2");
#endifand got the following in p_features in host_info:<p_features>sse sse2 ssse3 sse4_1 sse4_2 sse4a avx avx2</p_features>but it still can't do anything with the problem. I even mocked the entire string of p_features from x86_64 box ("fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid eagerfpu pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate vmmcall npt lbrv svm_lock nrip_save"), but guess what--it haven't done anything to the problem. I really ran out of ideas for now. By the way, does anyone have an idea how to send request XML (like manually constructed sched_request) to the scheduler using CURL? I'm up to do so and analyze return codes. I tried "curl -d sched_request_lhcathome.cern.ch_lhcathome.xml https://lhcathome.cern.ch/lhcathome_cgi/cgi", but got not error 500, but result 200 and this: <scheduler_reply> <scheduler_version>715</scheduler_version> <master_url>https://lhcathome.cern.ch/lhcathome/</master_url> <request_delay>6.000000</request_delay> <message priority="low">Error in request message: xp.get_tag() failed </message> <project_name>LHC@home</project_name> <send_full_workload/> </scheduler_reply> |
©2026 CERN