Message boards :
Number crunching :
Anonymous Platform - Scheduler request failed: HTTP internal server error
Message board moderation
Author | Message |
---|---|
Send message Joined: 26 Jul 13 Posts: 13 Credit: 2,043,458 RAC: 0 |
Hey LHC Devs, I was doing some testing today, to see how compatible the new 26202 vboxwrapper is for your apps. But I hit a problem during work fetch. Could you please take a look, and let us know what you find? Thanks! Work fetch log: 5/18/2019 9:26:10 AM | LHC@home | Requesting new tasks for CPU 5/18/2019 9:26:13 AM | LHC@home | Scheduler request failed: HTTP internal server error 5/18/2019 9:26:13 AM | | [work_fetch] Request work fetch: RPC complete sched_reply_lhcathome.cern.ch_lhcathome.xml: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>500 Internal Server Error</title> </head><body> <h1>Internal Server Error</h1> <p>The server encountered an internal error or misconfiguration and was unable to complete your request.</p> <p>Please contact the server administrator at boinc-server-admin@cern.ch to inform them of the time this error occurred, and the actions you performed just before this error.</p> <p>More information about this error may be available in the server error log.</p> </body></html> app_info.xml (with all related files in the project folder): <app_info> <app> <name>CMS</name> <user_friendly_name>CMS Simulation</user_friendly_name> <non_cpu_intensive>0</non_cpu_intensive> </app> <app> <name>Theory</name> <user_friendly_name>Theory Simulation</user_friendly_name> <non_cpu_intensive>0</non_cpu_intensive> </app> <app> <name>ATLAS</name> <user_friendly_name>ATLAS Simulation</user_friendly_name> <non_cpu_intensive>0</non_cpu_intensive> </app> <file> <name>vboxwrapper_26202_windows_x86_64.exe</name> <nbytes>1639936.000000</nbytes> <max_nbytes>0.000000</max_nbytes> <status>1</status> <executable/> </file> <file> <name>CMS_2016_03_22.xml</name> <nbytes>577.000000</nbytes> <max_nbytes>0.000000</max_nbytes> <status>1</status> </file> <file> <name>CMS_2019_03_25.vdi</name> <nbytes>3008364544.000000</nbytes> <max_nbytes>0.000000</max_nbytes> <status>1</status> </file> <file> <name>Theory_2017_05_29.xml</name> <nbytes>615.000000</nbytes> <max_nbytes>0.000000</max_nbytes> <status>1</status> </file> <file> <name>Theory_2018_11_23.vdi</name> <nbytes>504365056.000000</nbytes> <max_nbytes>0.000000</max_nbytes> <status>1</status> </file> <file> <name>ATLAS_2017_01_09.xml</name> <nbytes>451.000000</nbytes> <max_nbytes>0.000000</max_nbytes> <status>1</status> </file> <file> <name>ATLASM_2017_03_01.vdi</name> <nbytes>1745879040.000000</nbytes> <max_nbytes>0.000000</max_nbytes> <status>1</status> <executable/> </file> <file> <name>vboxwrapper_26202_windows_x86_64.pdb</name> <nbytes>7392256.000000</nbytes> <max_nbytes>0.000000</max_nbytes> <status>1</status> <executable/> </file> <app_version> <app_name>CMS</app_name> <version_num>4900</version_num> <platform>windows_x86_64</platform> <avg_ncpus>1.000000</avg_ncpus> <flops>30512450318.842464</flops> <plan_class>vbox64</plan_class> <api_version>7.7.0</api_version> <file_ref> <file_name>vboxwrapper_26202_windows_x86_64.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>CMS_2016_03_22.xml</file_name> <open_name>vbox_job.xml</open_name> </file_ref> <file_ref> <file_name>CMS_2019_03_25.vdi</file_name> <open_name>vm_image.vdi</open_name> <copy_file/> </file_ref> <dont_throttle/> <needs_network/> </app_version> <app_version> <app_name>Theory</app_name> <version_num>26390</version_num> <platform>windows_x86_64</platform> <avg_ncpus>2.000000</avg_ncpus> <flops>9003518068.638815</flops> <plan_class>vbox64_mt_mcore</plan_class> <api_version>7.7.0</api_version> <cmdline>--memory_size_mb 830</cmdline> <file_ref> <file_name>vboxwrapper_26202_windows_x86_64.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>Theory_2017_05_29.xml</file_name> <open_name>vbox_job.xml</open_name> </file_ref> <file_ref> <file_name>Theory_2018_11_23.vdi</file_name> <open_name>vm_image.vdi</open_name> <copy_file/> </file_ref> <dont_throttle/> <needs_network/> </app_version> <app_version> <app_name>ATLAS</app_name> <version_num>101</version_num> <platform>windows_x86_64</platform> <avg_ncpus>4.000000</avg_ncpus> <flops>3017627035.548801</flops> <plan_class>vbox64_mt_mcore_atlas</plan_class> <api_version>7.7.0</api_version> <cmdline>--memory_size_mb 6600</cmdline> <file_ref> <file_name>vboxwrapper_26202_windows_x86_64.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>ATLAS_2017_01_09.xml</file_name> <open_name>vbox_job.xml</open_name> </file_ref> <file_ref> <file_name>ATLASM_2017_03_01.vdi</file_name> <open_name>vm_image.vdi</open_name> <copy_file/> </file_ref> <file_ref> <file_name>vboxwrapper_26202_windows_x86_64.pdb</file_name> <open_name>vboxwrapper_26202_windows_x86_64.pdb</open_name> <copy_file/> </file_ref> <dont_throttle/> <is_wrapper/> <needs_network/> </app_version> </app_info> |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,391 RAC: 2,363 |
Hi Jacob, long time no see ;) We are at the end of the Pentathlon Challenge. For LHC a 3 days sprint ending coming night at 02:00 UTC. If you wish I could test the 26202-wrapper (Win64) tomorrow. One small point I saw: Your BOINC Data-directory is "E:\BOINC Data\" with a space in it. Not sure, whether that could cause a problem? |
Send message Joined: 26 Jul 13 Posts: 13 Credit: 2,043,458 RAC: 0 |
Hi Crystal, I don't think the space matters. This web issue deals with the server code processing my sched_request, I believe. Dev support will be needed to fix it. Edit: How did you see my data folder - were you already inspecting web logs? :) Something is broken. Also, yes please test vboxwrapper 26202. I'm noticing some big problems with it, and I'm logging them here: https://github.com/BOINC/boinc/issues/2915 |
Send message Joined: 24 Jun 10 Posts: 40 Credit: 5,307,216 RAC: 17,597 |
Moved |
Send message Joined: 26 Jul 13 Posts: 13 Credit: 2,043,458 RAC: 0 |
If your post is unrelated to: Scheduler request failed: HTTP internal server error ... then move it to a new thread please, per etiquette. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,391 RAC: 2,363 |
Also, yes please test vboxwrapper 26202. I'm noticing some big problems with it . . . .I did not discover big problems. VBox 6.0.8 / BOINC 7.14.2 / Win7 I tested startup, pausing, suspend, resume and restoring from snapshots. Tested applications: Theory - result https://lhcathome.cern.ch/lhcathome/result.php?resultid=229970452 ATLAS - result https://lhcathome.cern.ch/lhcathome/result.php?resultid=229968442 Create snapshot is lasting long because of the 3GB size and therefore tricky on a very busy system. BOINC wants to do it within 60s. CMS - result https://lhcathome.cern.ch/lhcathome/result.php?resultid=229934445 (results not finished yet) and 2 ready tasks from Cosmology: camb_boinc2docker. http://www.cosmologyathome.org/result.php?resultid=11068291 http://www.cosmologyathome.org/result.php?resultid=11067028 I only see a cosmetic issue when some lines from Guest Log sometimes are repeated in stderr.txt after a VM-restore. |
Send message Joined: 26 Jul 13 Posts: 13 Credit: 2,043,458 RAC: 0 |
Any chance you could also test on Windows 10? I will also redo my testing, using a release OS version instead of Insider version, and using a release version of VirtualBox instead of a Testbuild. Perhaps I'm hitting on a beta bug. It would explain why 26202 seemed to work fine for me earlier, and now doesn't. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,391 RAC: 2,363 |
Any chance you could also test on Windows 10?Interesting combination could be my small Windows 10 tablet - 32bits OS and VBox 5.2.30. Running Theory result with reduced 256MB RAM VM looks OK: https://lhcathome.cern.ch/lhcathome/result.php?resultid=230156336 Btw: Meanwhile the ATLAS task has completed. |
Send message Joined: 26 Jul 13 Posts: 13 Credit: 2,043,458 RAC: 0 |
Thanks. I think I'll have to wait a week to test it again, going out of town until next Monday. Regarding the original post, were you able to repro the "Scheduler request failed: HTTP internal server error" when trying to get tasks with a 26202 app_info.xml in place? |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,391 RAC: 2,363 |
Maybe the HTTP-error was related to the Pentathlon 'DDos' attack ahead of The Sprint discipline from clients filling buffers with the massive present SixTrack tasks. |
Send message Joined: 26 Jul 13 Posts: 13 Credit: 2,043,458 RAC: 0 |
I doubt it. Scheduler replies were working just fine, before and after, when not using the app_info.xml. Can you try? |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,391 RAC: 2,363 |
I doubt it. Scheduler replies were working just fine, before and after, when not using the app_info.xml. Your app_info.xml has the wrong tags. You used <file> </file> where it should be <file_info> </file_info> in app_info.xml. Maybe you could try it first yourself and a lot of your app_info.xml is redundant. A simplified app_info.xml for Theory only could look like <app_info> <app> <name>Theory</name> </app> <file_info> <name>vboxwrapper_26202_windows_x86_64.exe</name> <executable/> </file_info> <file_info> <name>Theory_2017_05_29.xml</name> </file_info> <file_info> <name>Theory_2018_11_23.vdi</name> </file_info> <app_version> <app_name>Theory</app_name> <version_num>26390</version_num> <platform>windows_x86_64</platform> <avg_ncpus>1.000000</avg_ncpus> <plan_class>vbox64_mt_mcore</plan_class> <api_version>7.7.0</api_version> <cmdline>--memory_size_mb 730</cmdline> <file_ref> <file_name>vboxwrapper_26202_windows_x86_64.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>Theory_2017_05_29.xml</file_name> <open_name>vbox_job.xml</open_name> </file_ref> <file_ref> <file_name>Theory_2018_11_23.vdi</file_name> <open_name>vm_image.vdi</open_name> <copy_file/> </file_ref> <dont_throttle/> <needs_network/> </app_version> </app_info> |
Send message Joined: 26 Jul 13 Posts: 13 Credit: 2,043,458 RAC: 0 |
I did try it, hence the report. No need to be like that. It has admittedly been a long time since I've done app_info testing. If I mixed up file_info instead of file, then that could be a culprit. Thanks for giving me ideas to try again. But I could have sworn this worked fine before using file tags. I must not be paying enough attention about things maybe. Could you try with file tags and see what you get? PS: It's not redundant. My goal is to test all 3 applications against the vboxwrapper, so I set up a single app_info.xml file to do that. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,391 RAC: 2,363 |
PS: It's not redundant. My goal is to test all 3 applications against the vboxwrapper, so I set up a single app_info.xml file to do that. Redundant was not about the CMS and ATLAS parts, but about file sizes, friendly names, cpu intensive etc. I'll test today with my app_info.xml and only Theory, because the HTTP error should have nothing to do with the kind of application. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,391 RAC: 2,363 |
I'll test today with my app_info.xml and only Theory, because the HTTP error should have nothing to do with the kind of application.Tested and bad surprise: LHC@home 20 May 08:45:57 work fetch resumed by user LHC@home 20 May 08:45:59 Master file download succeeded LHC@home 20 May 08:46:04 Sending scheduler request: To fetch work. LHC@home 20 May 08:46:04 Requesting new tasks for CPU LHC@home 20 May 08:46:06 Scheduler request failed: HTTP internal server error LHC@home 20 May 08:46:16 work fetch suspended by user Same sched_reply_lhcathome.cern.ch_lhcathome.xml as you had: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>500 Internal Server Error</title> </head><body> <h1>Internal Server Error</h1> <p>The server encountered an internal error or misconfiguration and was unable to complete your request.</p> <p>Please contact the server administrator at boinc-server-admin@cern.ch to inform them of the time this error occurred, and the actions you performed just before this error.</p> <p>More information about this error may be available in the server error log.</p> </body></html> |
Send message Joined: 2 May 07 Posts: 2101 Credit: 159,817,517 RAC: 132,770 |
David wrote this info in Atlas-Forum About Proxy-use: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5013&postid=38751#38751 Is this a reason therefore? |
Send message Joined: 15 Jul 05 Posts: 242 Credit: 5,800,306 RAC: 0 |
In principle no application should specify platform anonymous. If you would like to debug this further, please try on our LHCathome-dev project, where we have more flexibility with the server code. Laurence and other BOINC developers could try to fix the bug for upcoming releases, so that the scheduler does not end up with an internal server error. But this is anyway what you get if you give garbled input to a CGI script. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,391 RAC: 2,363 |
In principle no application should specify platform anonymous. If you would like to debug this further, please try on our LHCathome-dev project, where we have more flexibility with the server code. Laurence and other BOINC developers could try to fix the bug for upcoming releases, so that the scheduler does not end up with an internal server error.The term 'platform anonymous' is confusing. It's not a strange or unknown OS, but app_info can be used for non-standard applications. In our case it's BOINC's newest vboxwrapper to test, but with other projects an app_info.xml is rather often used for hardware-optimized applications. Maybe it would good to know, whether this is a BOINC server code problem. I'll test again with the dev-server and report over there. But this is anyway what you get if you give garbled input to a CGI script.The input from the app_info.xml is not garbled, but proper BOINC xml-code and tags. |
Send message Joined: 26 Jul 13 Posts: 13 Credit: 2,043,458 RAC: 0 |
Thanks for confirmation of the problem. Please be sure to post a link here, to the dev server thread, when you create one. Thanks again. |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,391 RAC: 2,363 |
Reported same failing request at the Dev-server: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=473 Thread here, at the production server, closed. |
©2024 CERN