Message boards : Number crunching : Anonymous Platform - Scheduler request failed: HTTP internal server error
Message board moderation

To post messages, you must log in.

AuthorMessage
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 2,043,458
RAC: 0
Message 38888 - Posted: 18 May 2019, 13:29:21 UTC
Last modified: 18 May 2019, 13:35:30 UTC

Hey LHC Devs,

I was doing some testing today, to see how compatible the new 26202 vboxwrapper is for your apps.
But I hit a problem during work fetch.
Could you please take a look, and let us know what you find?
Thanks!

Work fetch log:
5/18/2019 9:26:10 AM | LHC@home | Requesting new tasks for CPU
5/18/2019 9:26:13 AM | LHC@home | Scheduler request failed: HTTP internal server error
5/18/2019 9:26:13 AM |  | [work_fetch] Request work fetch: RPC complete


sched_reply_lhcathome.cern.ch_lhcathome.xml:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at 
 boinc-server-admin@cern.ch to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>


app_info.xml (with all related files in the project folder):
<app_info>

<app>
    <name>CMS</name>
    <user_friendly_name>CMS Simulation</user_friendly_name>
    <non_cpu_intensive>0</non_cpu_intensive>
</app>
<app>
    <name>Theory</name>
    <user_friendly_name>Theory Simulation</user_friendly_name>
    <non_cpu_intensive>0</non_cpu_intensive>
</app>
<app>
    <name>ATLAS</name>
    <user_friendly_name>ATLAS Simulation</user_friendly_name>
    <non_cpu_intensive>0</non_cpu_intensive>
</app>

<file>
    <name>vboxwrapper_26202_windows_x86_64.exe</name>
    <nbytes>1639936.000000</nbytes>
    <max_nbytes>0.000000</max_nbytes>
    <status>1</status>
    <executable/>
</file>
<file>
    <name>CMS_2016_03_22.xml</name>
    <nbytes>577.000000</nbytes>
    <max_nbytes>0.000000</max_nbytes>
    <status>1</status>
</file>
<file>
    <name>CMS_2019_03_25.vdi</name>
    <nbytes>3008364544.000000</nbytes>
    <max_nbytes>0.000000</max_nbytes>
    <status>1</status>
</file>
<file>
    <name>Theory_2017_05_29.xml</name>
    <nbytes>615.000000</nbytes>
    <max_nbytes>0.000000</max_nbytes>
    <status>1</status>
</file>
<file>
    <name>Theory_2018_11_23.vdi</name>
    <nbytes>504365056.000000</nbytes>
    <max_nbytes>0.000000</max_nbytes>
    <status>1</status>
</file>
<file>
    <name>ATLAS_2017_01_09.xml</name>
    <nbytes>451.000000</nbytes>
    <max_nbytes>0.000000</max_nbytes>
    <status>1</status>
</file>
<file>
    <name>ATLASM_2017_03_01.vdi</name>
    <nbytes>1745879040.000000</nbytes>
    <max_nbytes>0.000000</max_nbytes>
    <status>1</status>
    <executable/>
</file>
<file>
    <name>vboxwrapper_26202_windows_x86_64.pdb</name>
    <nbytes>7392256.000000</nbytes>
    <max_nbytes>0.000000</max_nbytes>
    <status>1</status>
    <executable/>
</file>

<app_version>
    <app_name>CMS</app_name>
    <version_num>4900</version_num>
    <platform>windows_x86_64</platform>
    <avg_ncpus>1.000000</avg_ncpus>
    <flops>30512450318.842464</flops>
    <plan_class>vbox64</plan_class>
    <api_version>7.7.0</api_version>
    <file_ref>
        <file_name>vboxwrapper_26202_windows_x86_64.exe</file_name>
        <main_program/>
    </file_ref>
    <file_ref>
        <file_name>CMS_2016_03_22.xml</file_name>
        <open_name>vbox_job.xml</open_name>
    </file_ref>
    <file_ref>
        <file_name>CMS_2019_03_25.vdi</file_name>
        <open_name>vm_image.vdi</open_name>
        <copy_file/>
    </file_ref>
    <dont_throttle/>
    <needs_network/>
</app_version>
<app_version>
    <app_name>Theory</app_name>
    <version_num>26390</version_num>
    <platform>windows_x86_64</platform>
    <avg_ncpus>2.000000</avg_ncpus>
    <flops>9003518068.638815</flops>
    <plan_class>vbox64_mt_mcore</plan_class>
    <api_version>7.7.0</api_version>
    <cmdline>--memory_size_mb 830</cmdline>
    <file_ref>
        <file_name>vboxwrapper_26202_windows_x86_64.exe</file_name>
        <main_program/>
    </file_ref>
    <file_ref>
        <file_name>Theory_2017_05_29.xml</file_name>
        <open_name>vbox_job.xml</open_name>
    </file_ref>
    <file_ref>
        <file_name>Theory_2018_11_23.vdi</file_name>
        <open_name>vm_image.vdi</open_name>
        <copy_file/>
    </file_ref>
    <dont_throttle/>
    <needs_network/>
</app_version>
<app_version>
    <app_name>ATLAS</app_name>
    <version_num>101</version_num>
    <platform>windows_x86_64</platform>
    <avg_ncpus>4.000000</avg_ncpus>
    <flops>3017627035.548801</flops>
    <plan_class>vbox64_mt_mcore_atlas</plan_class>
    <api_version>7.7.0</api_version>
    <cmdline>--memory_size_mb 6600</cmdline>
    <file_ref>
        <file_name>vboxwrapper_26202_windows_x86_64.exe</file_name>
        <main_program/>
    </file_ref>
    <file_ref>
        <file_name>ATLAS_2017_01_09.xml</file_name>
        <open_name>vbox_job.xml</open_name>
    </file_ref>
    <file_ref>
        <file_name>ATLASM_2017_03_01.vdi</file_name>
        <open_name>vm_image.vdi</open_name>
        <copy_file/>
    </file_ref>
    <file_ref>
        <file_name>vboxwrapper_26202_windows_x86_64.pdb</file_name>
        <open_name>vboxwrapper_26202_windows_x86_64.pdb</open_name>
        <copy_file/>
    </file_ref>
    <dont_throttle/>
    <is_wrapper/>
    <needs_network/>
</app_version>

</app_info>
ID: 38888 · Report as offensive
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38889 - Posted: 18 May 2019, 14:21:49 UTC

Hi Jacob, long time no see ;)

We are at the end of the Pentathlon Challenge. For LHC a 3 days sprint ending coming night at 02:00 UTC.
If you wish I could test the 26202-wrapper (Win64) tomorrow.

One small point I saw: Your BOINC Data-directory is "E:\BOINC Data\" with a space in it.
Not sure, whether that could cause a problem?
ID: 38889 · Report as offensive
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 2,043,458
RAC: 0
Message 38890 - Posted: 18 May 2019, 15:06:37 UTC - in response to Message 38889.  
Last modified: 18 May 2019, 15:22:16 UTC

Hi Crystal,

I don't think the space matters. This web issue deals with the server code processing my sched_request, I believe. Dev support will be needed to fix it. Edit: How did you see my data folder - were you already inspecting web logs? :) Something is broken.

Also, yes please test vboxwrapper 26202. I'm noticing some big problems with it, and I'm logging them here:
https://github.com/BOINC/boinc/issues/2915
ID: 38890 · Report as offensive
Profile tazzduke

Send message
Joined: 24 Jun 10
Posts: 39
Credit: 4,971,347
RAC: 4,992
Message 38891 - Posted: 18 May 2019, 16:38:01 UTC
Last modified: 18 May 2019, 16:47:31 UTC

Moved
ID: 38891 · Report as offensive
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 2,043,458
RAC: 0
Message 38892 - Posted: 18 May 2019, 16:46:04 UTC - in response to Message 38891.  

If your post is unrelated to:
Scheduler request failed: HTTP internal server error
... then move it to a new thread please, per etiquette.
ID: 38892 · Report as offensive
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38894 - Posted: 19 May 2019, 11:17:48 UTC - in response to Message 38890.  

Also, yes please test vboxwrapper 26202. I'm noticing some big problems with it . . . .
I did not discover big problems. VBox 6.0.8 / BOINC 7.14.2 / Win7
I tested startup, pausing, suspend, resume and restoring from snapshots.

Tested applications:
Theory - result https://lhcathome.cern.ch/lhcathome/result.php?resultid=229970452
ATLAS - result https://lhcathome.cern.ch/lhcathome/result.php?resultid=229968442 Create snapshot is lasting long because of the 3GB size and therefore tricky on a very busy system. BOINC wants to do it within 60s.
CMS - result https://lhcathome.cern.ch/lhcathome/result.php?resultid=229934445
(results not finished yet)

and 2 ready tasks from Cosmology: camb_boinc2docker.
http://www.cosmologyathome.org/result.php?resultid=11068291
http://www.cosmologyathome.org/result.php?resultid=11067028

I only see a cosmetic issue when some lines from Guest Log sometimes are repeated in stderr.txt after a VM-restore.
ID: 38894 · Report as offensive
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 2,043,458
RAC: 0
Message 38895 - Posted: 19 May 2019, 13:15:47 UTC - in response to Message 38894.  
Last modified: 19 May 2019, 13:20:30 UTC

Any chance you could also test on Windows 10?

I will also redo my testing, using a release OS version instead of Insider version, and using a release version of VirtualBox instead of a Testbuild. Perhaps I'm hitting on a beta bug. It would explain why 26202 seemed to work fine for me earlier, and now doesn't.
ID: 38895 · Report as offensive
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38896 - Posted: 19 May 2019, 14:00:35 UTC - in response to Message 38895.  
Last modified: 19 May 2019, 14:34:43 UTC

Any chance you could also test on Windows 10?
Interesting combination could be my small Windows 10 tablet - 32bits OS and VBox 5.2.30.

Running Theory result with reduced 256MB RAM VM looks OK: https://lhcathome.cern.ch/lhcathome/result.php?resultid=230156336

Btw: Meanwhile the ATLAS task has completed.
ID: 38896 · Report as offensive
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 2,043,458
RAC: 0
Message 38897 - Posted: 19 May 2019, 15:32:04 UTC - in response to Message 38896.  

Thanks. I think I'll have to wait a week to test it again, going out of town until next Monday.

Regarding the original post, were you able to repro the "Scheduler request failed: HTTP internal server error" when trying to get tasks with a 26202 app_info.xml in place?
ID: 38897 · Report as offensive
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38898 - Posted: 19 May 2019, 15:55:59 UTC - in response to Message 38897.  

Maybe the HTTP-error was related to the Pentathlon 'DDos' attack ahead of The Sprint discipline from clients filling buffers with the massive present SixTrack tasks.
ID: 38898 · Report as offensive
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 2,043,458
RAC: 0
Message 38899 - Posted: 19 May 2019, 16:07:32 UTC - in response to Message 38898.  
Last modified: 19 May 2019, 16:07:47 UTC

I doubt it. Scheduler replies were working just fine, before and after, when not using the app_info.xml.
Can you try?
ID: 38899 · Report as offensive
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38903 - Posted: 19 May 2019, 20:26:51 UTC - in response to Message 38899.  

I doubt it. Scheduler replies were working just fine, before and after, when not using the app_info.xml.
Can you try?

Your app_info.xml has the wrong tags. You used <file> </file> where it should be <file_info> </file_info> in app_info.xml.
Maybe you could try it first yourself and a lot of your app_info.xml is redundant.
A simplified app_info.xml for Theory only could look like
<app_info>
 <app>
  <name>Theory</name>
 </app>
 <file_info>
  <name>vboxwrapper_26202_windows_x86_64.exe</name>
  <executable/>
 </file_info>
 <file_info>
  <name>Theory_2017_05_29.xml</name>
 </file_info>
 <file_info>
    <name>Theory_2018_11_23.vdi</name>
 </file_info>
 <app_version>
  <app_name>Theory</app_name>
   <version_num>26390</version_num>
   <platform>windows_x86_64</platform>
   <avg_ncpus>1.000000</avg_ncpus>
   <plan_class>vbox64_mt_mcore</plan_class>
   <api_version>7.7.0</api_version>
   <cmdline>--memory_size_mb 730</cmdline>
   <file_ref>
    <file_name>vboxwrapper_26202_windows_x86_64.exe</file_name>
    <main_program/>
   </file_ref>
   <file_ref>
    <file_name>Theory_2017_05_29.xml</file_name>
    <open_name>vbox_job.xml</open_name>
   </file_ref>
   <file_ref>
    <file_name>Theory_2018_11_23.vdi</file_name>
    <open_name>vm_image.vdi</open_name>
    <copy_file/>
   </file_ref>
   <dont_throttle/>
   <needs_network/>
 </app_version>
</app_info>
ID: 38903 · Report as offensive
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 2,043,458
RAC: 0
Message 38905 - Posted: 19 May 2019, 21:14:53 UTC
Last modified: 19 May 2019, 21:19:53 UTC

I did try it, hence the report. No need to be like that.

It has admittedly been a long time since I've done app_info testing. If I mixed up file_info instead of file, then that could be a culprit. Thanks for giving me ideas to try again.

But I could have sworn this worked fine before using file tags. I must not be paying enough attention about things maybe. Could you try with file tags and see what you get?

PS: It's not redundant. My goal is to test all 3 applications against the vboxwrapper, so I set up a single app_info.xml file to do that.
ID: 38905 · Report as offensive
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38908 - Posted: 20 May 2019, 5:26:35 UTC - in response to Message 38905.  

PS: It's not redundant. My goal is to test all 3 applications against the vboxwrapper, so I set up a single app_info.xml file to do that.

Redundant was not about the CMS and ATLAS parts, but about file sizes, friendly names, cpu intensive etc.

I'll test today with my app_info.xml and only Theory, because the HTTP error should have nothing to do with the kind of application.
ID: 38908 · Report as offensive
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38910 - Posted: 20 May 2019, 6:56:33 UTC - in response to Message 38908.  

I'll test today with my app_info.xml and only Theory, because the HTTP error should have nothing to do with the kind of application.
Tested and bad surprise:

LHC@home 20 May 08:45:57 work fetch resumed by user
LHC@home 20 May 08:45:59 Master file download succeeded
LHC@home 20 May 08:46:04 Sending scheduler request: To fetch work.
LHC@home 20 May 08:46:04 Requesting new tasks for CPU
LHC@home 20 May 08:46:06 Scheduler request failed: HTTP internal server error
LHC@home 20 May 08:46:16 work fetch suspended by user


Same sched_reply_lhcathome.cern.ch_lhcathome.xml as you had:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at 
 boinc-server-admin@cern.ch to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>
ID: 38910 · Report as offensive
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,085,146
RAC: 104,467
Message 38911 - Posted: 20 May 2019, 7:24:40 UTC

David wrote this info in Atlas-Forum About Proxy-use:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5013&postid=38751#38751
Is this a reason therefore?
ID: 38911 · Report as offensive
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 242
Credit: 5,800,306
RAC: 0
Message 38913 - Posted: 20 May 2019, 10:15:06 UTC

In principle no application should specify platform anonymous. If you would like to debug this further, please try on our LHCathome-dev project, where we have more flexibility with the server code. Laurence and other BOINC developers could try to fix the bug for upcoming releases, so that the scheduler does not end up with an internal server error.

But this is anyway what you get if you give garbled input to a CGI script.
ID: 38913 · Report as offensive
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38915 - Posted: 20 May 2019, 11:05:17 UTC - in response to Message 38913.  

In principle no application should specify platform anonymous. If you would like to debug this further, please try on our LHCathome-dev project, where we have more flexibility with the server code. Laurence and other BOINC developers could try to fix the bug for upcoming releases, so that the scheduler does not end up with an internal server error.
The term 'platform anonymous' is confusing. It's not a strange or unknown OS, but app_info can be used for non-standard applications.
In our case it's BOINC's newest vboxwrapper to test, but with other projects an app_info.xml is rather often used for hardware-optimized applications.
Maybe it would good to know, whether this is a BOINC server code problem. I'll test again with the dev-server and report over there.

But this is anyway what you get if you give garbled input to a CGI script.
The input from the app_info.xml is not garbled, but proper BOINC xml-code and tags.
ID: 38915 · Report as offensive
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 2,043,458
RAC: 0
Message 38916 - Posted: 20 May 2019, 11:51:58 UTC

Thanks for confirmation of the problem. Please be sure to post a link here, to the dev server thread, when you create one. Thanks again.
ID: 38916 · Report as offensive
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1268
Credit: 8,421,616
RAC: 2,139
Message 38917 - Posted: 20 May 2019, 12:15:16 UTC
Last modified: 20 May 2019, 12:17:33 UTC

Reported same failing request at the Dev-server: https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=473

Thread here, at the production server, closed.
ID: 38917 · Report as offensive

Message boards : Number crunching : Anonymous Platform - Scheduler request failed: HTTP internal server error


©2024 CERN