Message boards : Sixtrack Application : SixTrack on anonymous platform: getting error 500 from scheduler
Message board moderation

To post messages, you must log in.

AuthorMessage
muzzdiez

Send message
Joined: 21 Mar 15
Posts: 2
Credit: 9,392,193
RAC: 21,507
Message 43796 - Posted: 5 Dec 2020, 21:05:43 UTC

Hi all!
I have a specific hardware platform with WLIV Elbrus CPU onboard, and I managed to build SixTrack for it with both SSE2 and AVX modes (yes, Elbrus has its support for both of these SIMD extensions). Neither of built apps are runnable though.

I did it the following way:
- Cloned SixTrack git repo, switched to version 5.4.3 commit
- In CMakeLists.txt, added "${CMAKE_SYSTEM_PROCESSOR} MATCHES e2k" check to the places needed
- Initialized and updated BOINC subproject, built boinc_api_fortran.o and libraries inside this subproject
- Built sixtrack applications using "./cmake_six CR BOINC NATIVE" and "./cmake_six CR BOINC NATIVE AVX"
- Installed BOINC, connected to my BAM! account, attached this host to LHC@home, got expected (at this moment) message "[LHC@home] This project doesn't support computers of type e2k-mcst-linux-gnu"
- Shut down BOINC, put one of the applications and corresponding app_info.xml inside "projects/lhcathome.cern.ch_lhcathome" directory in BOINC data directory
- Ran BOINC again, tried to update project
- Got "[LHC@home] Scheduler request failed: HTTP internal server error" that I would not expect.

Resetting project does not help. Neither do detaching, re-creating a project directory (with app_info and binaries), and then attaching to project again.

Tested it on two hosts, one behind proxy, and one directly connected to internet, but to no avail. Milkyway@Home and Einstein@home works on these hosts perfectly nice using the similar way, though.

My app_info.xml (tried many of them, not a single one was ok):
<app_info>
    <app>
        <name>sixtrack</name>
        <user_friendly_name>SixTrack</user_friendly_name>
        <non_cpu_intensive>0</non_cpu_intensive>
    </app>
    <file>
        <name>sixtrack_avx</name>
        <status>1</status>
        <executable/>
    </file>
    <app_version>
        <app_name>sixtrack</app_name>
        <version_num>50205</version_num>
        <platform>anonymous</platform>
        <avg_ncpus>1.000000</avg_ncpus>
        <flops>100000000.000000</flops>
        <plan_class>avx</plan_class>
        <api_version>7.14.2</api_version>
        <file_ref>
            <file_name>sixtrack_avx</file_name>
            <main_program/>
        </file_ref>
    </app_version>
</app_info>

Most of contents is guessed by client_state.xml from a host where it's everything ok with SixTrack.
I tried both avx and sse2 binaries and plan classes, or either one of them; I tried setting api_version to actual version of BOINC, or removing it; I tried to remove and/or alter other fields, like avg_ncpus, flops, platform (in it, I tried anonymous, e2k-mcst-linux-gnu, e2k-linux-gnu, or no such field at all), version_num, status, non_cpu_intensive, etc.

My sched_request_lhcathome.cern.ch_lhcathome.xml (authenticator is removed for security):
<scheduler_request>
    <authenticator>....</authenticator>
    <hostid>10674216</hostid>
    <rpc_seqno>6</rpc_seqno>
    <core_client_major_version>7</core_client_major_version>
    <core_client_minor_version>16</core_client_minor_version>
    <core_client_release>14</core_client_release>
    <resource_share_fraction>1.000000</resource_share_fraction>
    <rrs_fraction>1.000000</rrs_fraction>
    <prrs_fraction>1.000000</prrs_fraction>
    <duration_correction_factor>1.000000</duration_correction_factor>
    <allow_multiple_clients>0</allow_multiple_clients>
    <sandbox>0</sandbox>
    <dont_send_work>0</dont_send_work>
    <work_req_seconds>1658880.000000</work_req_seconds>
    <cpu_req_secs>1658880.000000</cpu_req_secs>
    <cpu_req_instances>32.000000</cpu_req_instances>
    <estimated_delay>0.000000</estimated_delay>
    <client_cap_plan_class>1</client_cap_plan_class>
    <platform_name>anonymous</platform_name>
<working_global_preferences>
<global_preferences>
   <source_project></source_project>
   <mod_time>0.000000</mod_time>
   <battery_charge_min_pct>90.000000</battery_charge_min_pct>
   <battery_max_temperature>40.000000</battery_max_temperature>
   <run_on_batteries>0</run_on_batteries>
   <run_if_user_active>1</run_if_user_active>
   <run_gpu_if_user_active>0</run_gpu_if_user_active>
   <suspend_if_no_recent_input>0.000000</suspend_if_no_recent_input>
   <suspend_cpu_usage>25.000000</suspend_cpu_usage>
   <start_hour>0.000000</start_hour>
   <end_hour>0.000000</end_hour>
   <net_start_hour>0.000000</net_start_hour>
   <net_end_hour>0.000000</net_end_hour>
   <leave_apps_in_memory>0</leave_apps_in_memory>
   <confirm_before_connecting>1</confirm_before_connecting>
   <hangup_if_dialed>0</hangup_if_dialed>
   <dont_verify_images>0</dont_verify_images>
   <work_buf_min_days>0.100000</work_buf_min_days>
   <work_buf_additional_days>0.500000</work_buf_additional_days>
   <max_ncpus_pct>0.000000</max_ncpus_pct>
   <cpu_scheduling_period_minutes>60.000000</cpu_scheduling_period_minutes>
   <disk_interval>60.000000</disk_interval>
   <disk_max_used_gb>0.000000</disk_max_used_gb>
   <disk_max_used_pct>90.000000</disk_max_used_pct>
   <disk_min_free_gb>0.100000</disk_min_free_gb>
   <vm_max_used_pct>75.000000</vm_max_used_pct>
   <ram_max_used_busy_pct>50.000000</ram_max_used_busy_pct>
   <ram_max_used_idle_pct>90.000000</ram_max_used_idle_pct>
   <idle_time_to_run>3.000000</idle_time_to_run>
   <max_bytes_sec_up>0.000000</max_bytes_sec_up>
   <max_bytes_sec_down>0.000000</max_bytes_sec_down>
   <cpu_usage_limit>100.000000</cpu_usage_limit>
   <daily_xfer_limit_mb>0.000000</daily_xfer_limit_mb>
   <daily_xfer_period_days>0</daily_xfer_period_days>
   <override_file_present>0</override_file_present>
   <network_wifi_only>0</network_wifi_only>
</global_preferences>
</working_global_preferences>
<cross_project_id>a8f1beff410131bb325a4041577b4c90</cross_project_id>
<time_stats>
    <on_frac>0.999549</on_frac>
    <connected_frac>-1.000000</connected_frac>
    <cpu_and_network_available_frac>1.000000</cpu_and_network_available_frac>
    <active_frac>1.000000</active_frac>
    <gpu_active_frac>1.000000</gpu_active_frac>
    <client_start_time>1607201147.328425</client_start_time>
    <total_start_time>1607137250.782878</total_start_time>
    <total_duration>859.837951</total_duration>
    <total_active_duration>859.837951</total_active_duration>
    <total_gpu_active_duration>859.837951</total_gpu_active_duration>
    <now>1607201150.834698</now>
    <previous_uptime>175.987093</previous_uptime>
    <session_active_duration>0.000000</session_active_duration>
    <session_gpu_active_duration>0.000000</session_gpu_active_duration>
</time_stats>
<net_stats>
    <bwup>18398.141947</bwup>
    <avg_up>470059541.941634</avg_up>
    <avg_time_up>1607138339.565820</avg_time_up>
    <bwdown>4639723.433021</bwdown>
    <avg_down>520727181988.380493</avg_down>
    <avg_time_down>1607138339.478808</avg_time_down>
</net_stats>
<host_info>
    <timezone>10800</timezone>
    <domain_name>mamizou</domain_name>
    <ip_addr>192.168.0.153</ip_addr>
    <host_cpid>dee8b86df3d72f6b3c1875514a723de7</host_cpid>
    <p_ncpus>32</p_ncpus>
    <p_vendor>E8C</p_vendor>
    <p_model>E8C [Family 4 Model 7 ]</p_model>
    <p_features></p_features>
    <p_fpops>1000000000.000000</p_fpops>
    <p_iops>1000000000.000000</p_iops>
    <p_membw>1000000000.000000</p_membw>
    <p_calculated>1607137089.123090</p_calculated>
    <p_vm_extensions_disabled>0</p_vm_extensions_disabled>
    <m_nbytes>236458024960.000000</m_nbytes>
    <m_cache>-1.000000</m_cache>
    <m_swap>0.000000</m_swap>
    <d_total>983312404480.000000</d_total>
    <d_free>796927619072.000000</d_free>
    <os_name>Linux Debian</os_name>
    <os_version>Debian GNU/Linux [5.4.0-1.9-e8c|libc 2.29 (GNU libc)]</os_version>
    <n_usable_coprocs>0</n_usable_coprocs>
    <wsl_available>0</wsl_available>
</host_info>
    <disk_usage>
        <d_boinc_used_total>45367296.000000</d_boinc_used_total>
        <d_boinc_used_project>45162496.000000</d_boinc_used_project>
        <d_project_share>876552173404.160034</d_project_share>
    </disk_usage>
<app_versions>
<app_version>
    <app_name>sixtrack</app_name>
    <version_num>50205</version_num>
    <platform>anonymous</platform>
    <avg_ncpus>1.000000</avg_ncpus>
    <flops>100000000.000000</flops>
    <plan_class>avx</plan_class>
    <api_version>7.14.2</api_version>
</app_version>
</app_versions>
<other_results>
</other_results>
<in_progress_results>
</in_progress_results>
</scheduler_request>

My sched_reply_lhcathome.cern.ch_lhcathome.xml:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at.
 boinc-server-admin@cern.ch to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>

May please anyone help me on getting SixTrack to work on anonymous platform?
ID: 43796 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,969,878
RAC: 136,681
Message 43797 - Posted: 5 Dec 2020, 22:18:27 UTC - in response to Message 43796.  

A basic description of the anonymous platform can be found here:
https://boinc.berkeley.edu/wiki/Anonymous_platform
This page states:
"... the <platform> element should be removed."
The tag names are slightly different to your app_info.xml.

I would suggest to rename the executable to "sixtrack_lin64_50205_avx.linux" and use this app_info.xml:
<app_info>
    <app>
        <name>sixtrack</name>
    </app>
    <file_info>
        <name>sixtrack_lin64_50205_avx.linux</name>
        <executable/>
    </file_info>
    <app_version>
        <app_name>sixtrack</app_name>
        <version_num>50205</version_num>
        <api_version>7.14.2</api_version>
        <plan_class>avx</plan_class>
        <flops>100000000.000000</flops>
        <avg_ncpus>1.0</avg_ncpus>
        <file_ref>
            <file_name>sixtrack_lin64_50205_avx.linux</file_name>
            <main_program/>
        </file_ref>
    </app_version>
</app_info>

The BOINC client must be restarted to activate the changes.



In addition your scheduler request has empty processor features:
<p_features></p_features>

Don't know if at least "avx" must be send to the server.
ID: 43797 · Report as offensive     Reply Quote
muzzdiez

Send message
Joined: 21 Mar 15
Posts: 2
Credit: 9,392,193
RAC: 21,507
Message 43798 - Posted: 6 Dec 2020, 0:54:10 UTC - in response to Message 43797.  
Last modified: 6 Dec 2020, 1:04:30 UTC

Thanks for your suggestions! Unfortunately, neither of them worked.
"... the <platform> element should be removed."
Yes, I tried removing platform too, as well as specifying it anonymous, e2k-mcst-linux-gnu, e2k-linux-gnu, and, for now, even x86_64-pc-linux-gnu, to get the exact same app_version block, as it is on x86_64 machine (I also messed a bit with number of zeros in avg_ncpus, which does always end up in six zeros in request, and setting flops to 1634109463.151072):
<app_version>
    <app_name>sixtrack</app_name>
    <version_num>50205</version_num>
    <platform>x86_64-pc-linux-gnu</platform>
    <avg_ncpus>1.000000</avg_ncpus>
    <flops>1634109463.151072</flops>
    <plan_class>sse2</plan_class>
    <api_version>7.14.2</api_version>
</app_version>
But, on x86_64 everything with such app_version block goes fine, and here it does not.
The tag names are slightly different to your app_info.xml.
Well, you're right, probably that's a mistake, but still nothing changed, when I fixed <file>...</file> to <file_info>...</file_info>.
I would suggest to rename the executable to "sixtrack_lin64_50205_avx.linux" and use this app_info.xml
I tried, and got the exact same result. I tried the same with the sse2 build too, but still have no success.
In addition your scheduler request has empty processor features
Good idea, but it stil didn't help. I tried patching BOINC in client/hostinfo_unix.cpp:709 like this:
#if defined(__e2k__)
    safe_strcpy(features, "sse sse2 ssse3 sse4_1 sse4_2 sse4a avx avx2");
#endif
and got the following in p_features in host_info:
    <p_features>sse sse2 ssse3 sse4_1 sse4_2 sse4a avx avx2</p_features>
but it still can't do anything with the problem.
I even mocked the entire string of p_features from x86_64 box ("fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid eagerfpu pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate vmmcall npt lbrv svm_lock nrip_save"), but guess what--it haven't done anything to the problem.

I really ran out of ideas for now.

By the way, does anyone have an idea how to send request XML (like manually constructed sched_request) to the scheduler using CURL? I'm up to do so and analyze return codes.
I tried "curl -d sched_request_lhcathome.cern.ch_lhcathome.xml https://lhcathome.cern.ch/lhcathome_cgi/cgi", but got not error 500, but result 200 and this:
<scheduler_reply>
<scheduler_version>715</scheduler_version>
<master_url>https://lhcathome.cern.ch/lhcathome/</master_url>
<request_delay>6.000000</request_delay>
<message priority="low">Error in request message: xp.get_tag() failed </message>
<project_name>LHC@home</project_name>
<send_full_workload/>
</scheduler_reply>
ID: 43798 · Report as offensive     Reply Quote

Message boards : Sixtrack Application : SixTrack on anonymous platform: getting error 500 from scheduler


©2024 CERN