Message boards : LHC@home Science : Sixtrack apps - Bad "api_version" - BOINC 7.8.2 display grids failing
Message board moderation

To post messages, you must log in.

AuthorMessage
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 1,298,483
RAC: 1,571
Message 32358 - Posted: 8 Sep 2017, 19:59:54 UTC
Last modified: 8 Sep 2017, 20:06:16 UTC

I'm noticing a problem, where BOINC 7.8.2 Advanced View will fail to correctly load the Project/Task grids, every second, only partially loading, then reloading, repeatedly.

I've tracked it to this project, specifically to apps having bad info within their "api_version" XML properties. This causes the 7.8.2 manager to freak out trying to parse that xml and read the strings!

Once I remove this project, all is well. But then when I re-added it, it broke again!

Could you guys please investigate how your <api_version> XML properties are becoming messed up, and fix it?
Note: The first one has a binary character before the close tag, which is ultimately what hoses BOINC, I think. But I don't know how.
See below.


Thanks,
Jacob Klein

For sure caused a problem:
<app_version>
    <app_name>sixtracktest</app_name>
    <version_num>4630</version_num>
    <platform>windows_x86_64</platform>
    <avg_ncpus>1.000000</avg_ncpus>
    <max_ncpus>1.000000</max_ncpus>
    <flops>4763423249.552478</flops>
    <plan_class>sse2</plan_class>
    <api_version>7.7.0
API_VERSI</api_version>
    <file_ref>
        <file_name>sixtracktest_win64_4630_sse2.exe</file_name>
        <main_program/>
    </file_ref>
</app_version>


Older set, that maybe caused a problem:
<app_version>
    <app_name>sixtrack</app_name>
    <version_num>45107</version_num>
    <platform>windows_intelx86</platform>
    <avg_ncpus>1.000000</avg_ncpus>
    <max_ncpus>1.000000</max_ncpus>
    <flops>6757081731.981332</flops>
    <plan_class>pni</plan_class>
    <api_version>7.1.0</api_version>
    <file_ref>
        <file_name>sixtrack_win32_4517_pni.exe</file_name>
        <main_program/>
    </file_ref>
</app_version>
<app_version>
    <app_name>sixtrack</app_name>
    <version_num>4630</version_num>
    <platform>windows_intelx86</platform>
    <avg_ncpus>1.000000</avg_ncpus>
    <max_ncpus>1.000000</max_ncpus>
    <flops>14855145875.554390</flops>
    <plan_class>sse2</plan_class>
    <api_version>7.7.0
API_VERSI</api_version>
    <file_ref>
        <file_name>sixtrack_win32_4630_sse2.exe</file_name>
        <main_program/>
    </file_ref>
</app_version>
<app_version>
    <app_name>sixtrack</app_name>
    <version_num>4630</version_num>
    <platform>windows_x86_64</platform>
    <avg_ncpus>1.000000</avg_ncpus>
    <max_ncpus>1.000000</max_ncpus>
    <flops>19207728646.985275</flops>
    <plan_class>sse2</plan_class>
    <api_version>7.7.0
API_VERSI</api_version>
    <file_ref>
        <file_name>sixtrack_win64_4630_sse2.exe</file_name>
        <main_program/>
    </file_ref>
</app_version>
<app_version>
    <app_name>sixtracktest</app_name>
    <version_num>4630</version_num>
    <platform>windows_intelx86</platform>
    <avg_ncpus>1.000000</avg_ncpus>
    <max_ncpus>1.000000</max_ncpus>
    <flops>4480961228.675860</flops>
    <plan_class>sse2</plan_class>
    <api_version>7.7.0
API_VERSI</api_version>
    <file_ref>
        <file_name>sixtracktest_win32_4630_sse2.exe</file_name>
        <main_program/>
    </file_ref>
</app_version>
ID: 32358 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 934
Credit: 6,286,540
RAC: 683
Message 32366 - Posted: 9 Sep 2017, 9:13:15 UTC - in response to Message 32358.  

Hi Jacob,

I too see those strange API_VERSI in my client_state,

<app_version>
<app_name>sixtracktest</app_name>
<version_num>4630</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>4823445069.717739</flops>
<plan_class>sse2</plan_class>
<api_version>7.7.0
API_VERSI</api_version>
<file_ref>
<file_name>sixtracktest_win64_4630_sse2.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app_version>
<app_name>sixtracktest</app_name>
<version_num>4630</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>6112078616.967116</flops>
<plan_class>sse2</plan_class>
<api_version>7.7.0
API_VERSI</api_version>
<file_ref>
<file_name>sixtracktest_win32_4630_sse2.exe</file_name>
<main_program/>
</file_ref>
</app_version>


but I don't have an issue with BOINC's Advanced View and sixtracktest tasks are running normally.

08-Sep-2017 13:53:15 CEST [---] Version change (7.7.2 -> 7.8.2)

Probably one difference with you is, that I loaded the tasks before the change.
ID: 32366 · Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 1,298,483
RAC: 1,571
Message 32368 - Posted: 9 Sep 2017, 12:59:09 UTC
Last modified: 9 Sep 2017, 13:07:31 UTC

I'm still troubleshooting, and the bug (where it crashes the 7.8.2 BOINC Manager) is a little bit elusive, but...

I think sometimes it'll say:
API_VERSI</api_version>

and sometimes it'll say
API_VERSI*</api_version>

where * is some crazy binary character.

If that binary character gets into a sched_reply or your client_state.xml file, and attempted to be loaded into memory, then things get hosed.

Either way, I believe that LHC@Home fixing <api_version> will be the answer.

Can we get it fixed?

PS: You can encapsulate a block in "pre" (for pre-formatted text), to keep the formatting and tab stops.
ID: 32368 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,254,459
RAC: 2,069
Message 32379 - Posted: 10 Sep 2017, 17:50:52 UTC - in response to Message 32368.  

Hello Jacob,

thanks for pointing this out. I have triggered our IT experts, for a better insight on this point. Will keep you posted!
ID: 32379 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 220
Credit: 4,899,621
RAC: 243
Message 32383 - Posted: 11 Sep 2017, 7:42:09 UTC

Jacob, do you see this with other applications than Sixtracktest?

Unlike other applications, Sixtrack and Sixtracktest do not have a version.xml with the application, so I wonder where this comes from.

BOINC 7.8.2 is a recent BOINC client, and we do not see this with BOINC 7.6.22.

If the error is due to garbled info generated on the server side, do you get the same errors on: https://lhcathomedev.cern.ch/lhcathome-dev/ ?

Thanks for filling us out.
ID: 32383 · Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 1,298,483
RAC: 1,571
Message 32386 - Posted: 11 Sep 2017, 13:11:42 UTC
Last modified: 11 Sep 2017, 13:25:32 UTC

I'm an active BOINC Alpha tester, attached to all of the projects, routinely getting work from about 11 ... and this is the first time I've ever seen this problem.

The only applications I've ever seen this problem with, are Sixtrack and Sixtracktest, on BOINC 7.8.2. Note: This new version of BOINC (client AND server) does have many string sprintf changes. It's possible that something isn't terminating a string correctly, or a reader isn't reading correctly.

I'm attached to your dev project too, but my workload doesn't really offer much of a chance to do work for your projects. I'll try to adjust that.

Regarding "how" it might be garbled ... Please read the following, which might help you do some testing to see if you can recreate the problem or find the fault. Richard Haselgrove (another tester who knows much more about the code than I do), said this, to the BOINC Alpha list service:


API version number in <app_version>

BOINC uses a pretty arcane method of populating that field.

When the API library is compiled the BOINC code version number is embedded, even though that number only really relates to the next intended version of the client.

Then, when the project science application is compiled (which may be months or years later) the API library code - including the version number - is linked into the executable.

Then, when the science application is deployed to the BOINC server for distribution (again, possibly months later), the application binary is scanned for an API_VERSION string, and the following number is copied into the <app_version> XML blob.

And finally, that XML is transmitted to the client when the sched_reply first allocates a task to that <app_version>.

So the source of the corrupted API field may be several steps back in the process, and probably happened on the project server.

The next step would be to search the defined application binaries with a hex editor, looking for API_VERSION, and check the following bytes very carefully, checking for unprintable string termination characters (usually 00). If the API version data is corrupted in one of the binaries, look how old that application is, and how long ago it was first distributed. If the answer is 'more than a week', then it's probably nothing to do with v7.8.2

Any more, ask me when I get home next week.


Let us know what you find!
Kind regards,
Jacob Klein
ID: 32386 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 220
Credit: 4,899,621
RAC: 243
Message 32387 - Posted: 11 Sep 2017, 14:26:19 UTC - in response to Message 32386.  

Thanks for this useful information!

On a quick search on the Sixtracktest binaries, I cannot see any odd characters, but only API version 7.7:

-bash-4.1$ strings */*|grep API_VERSION
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION
LPAPI_VERSION
API_VERSION_7.7.0
API_VERSION
LPAPI_VERSION
API_VERSION_7.7.0
API_VERSION
LPAPI_VERSION
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0
API_VERSION_7.7.0

I will ask the Sixtrack team for further details.
ID: 32387 · Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 1,298,483
RAC: 1,571
Message 32388 - Posted: 11 Sep 2017, 14:59:49 UTC - in response to Message 32387.  

Might be useful to search for:
API_VERSI
ID: 32388 · Report as offensive     Reply Quote
Juha

Send message
Joined: 22 Mar 17
Posts: 30
Credit: 360,676
RAC: 0
Message 32391 - Posted: 11 Sep 2017, 17:50:04 UTC

Could you check how it's in sched_reply_lhcathome.cern.ch_lhcathome.xml? You need to have SixTrack tasks assigned in that reply for the app_version to be included.
ID: 32391 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 220
Credit: 4,899,621
RAC: 243
Message 32395 - Posted: 12 Sep 2017, 6:48:26 UTC - in response to Message 32388.  

A search for API_VERSI gives the same result as for API_VERSION.

I have Sixtrack tasks downloaded, but paused, and my sched_reply_lhcathome.cern.ch_lhcathome.xml does not contain any info related to app versions.

Need to await the next round of Sixtrack tasks.

Could this be an issue with the 7.8.2 client on Windows 10?
ID: 32395 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,254,459
RAC: 2,069
Message 32397 - Posted: 12 Sep 2017, 16:30:57 UTC - in response to Message 32395.  
Last modified: 12 Sep 2017, 16:31:25 UTC

I have installed the BOINC client 7.8.2 on my Windows7 machine (actually a virtual machine in VB in an Ubuntu environment):
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10481870

And I have successfully crunched some brand new tasks:
https://lhcathome.cern.ch/lhcathome/results.php?hostid=10481870

What is the name of the app BLOB .xml file?
ID: 32397 · Report as offensive     Reply Quote
Juha

Send message
Joined: 22 Mar 17
Posts: 30
Credit: 360,676
RAC: 0
Message 32398 - Posted: 12 Sep 2017, 16:53:02 UTC - in response to Message 32397.  

I got some SixTrack tasks and this is in sched_reply:

<app_version>
    <app_name>sixtrack</app_name>
    <version_num>4630</version_num>
    <api_version>7.7.0
API_VERSION
LPAPI_VERSION</api_version>
<file_ref>
    <file_name>sixtrack_win64_4630_sse2.exe</file_name>
    <main_program/>
</file_ref>
    <platform>windows_x86_64</platform>
    <plan_class>sse2</plan_class>
    <avg_ncpus>1.000000</avg_ncpus>
    <max_ncpus>1.000000</max_ncpus>
    <flops>11772905691.719425</flops>
</app_version>


The <api_version> garbage is client_state.xml is truncated because the client stores the api version in max 16 byte string.

IIRC, <app_version> is copied from XML blob in DB as is. Do you use update_version or something else to add app versions?



Not related to the api_version but I'll mention it anyway because you are going to get questions about it. Previous SixTrack version was 451.07 and the new version is 46.30, that is, less than previous. The client keeps only the app version that has the highest version number. Ones with lesser version numbers are deleted as soon as no task refers to them.

Because of that the client keeps re-downloading the app's files over and over again if at any moment it runs out of SixTrack tasks. The only way out of it is to reset the project so that the client forgets about the 451.07 version or you re-release 46.30 with a higher version number.
ID: 32398 · Report as offensive     Reply Quote
Alessio Mereghetti
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 29 Feb 16
Posts: 157
Credit: 2,254,459
RAC: 2,069
Message 32413 - Posted: 13 Sep 2017, 10:00:52 UTC - in response to Message 32398.  
Last modified: 13 Sep 2017, 10:06:20 UTC

Hello Juha,

thanks for the detailed reply. This is what I get in my sched_reply_lhcathome.cern.ch_lhcathome.xml:

<app_version>
    <app_name>sixtrack</app_name>
    <version_num>4630</version_num>
    <api_version>7.7.0
API_VERSION
LPAPI_VERSION</api_version>
<file_ref>
    <file_name>sixtrack_win32_4630_sse2.exe</file_name>
    <main_program/>
</file_ref>
    <platform>windows_intelx86</platform>
    <plan_class>sse2</plan_class>
    <avg_ncpus>1.000000</avg_ncpus>
    <max_ncpus>1.000000</max_ncpus>
    <flops>9839391853.388186</flops>
</app_version>


Hence no corruption. This is for the 32bit windows exe, not for the 64bit, as it happens to you. My sched_reply doesn't show any line for the 64bit, do you have anything on the 32bit?


Thanks also for pointing out the problem with the version number of the exe - I think it explains nicely why, after releasing the new exes, the very first tasks were executed still with the old ones. Though, I guess that, since we declared as deprecated the 45107 exes, then only the 4630 should be distributed, and I see this is happening; so I guess that the issue that you raise is less than a concern, isn't it?
ID: 32413 · Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 26 Jul 13
Posts: 13
Credit: 1,298,483
RAC: 1,571
Message 32414 - Posted: 13 Sep 2017, 10:49:59 UTC - in response to Message 32413.  
Last modified: 13 Sep 2017, 10:50:31 UTC

Wow.

He's not saying that your reply is corrupted. His wasn't.

I think he's saying that BOINC is expecting 16 chars or less for that field, and will have problems if you put so much in there.

Why can't that field just say "7.7.0" ?

<api_version>7.7.0
API_VERSION
LPAPI_VERSION</api_version>
ID: 32414 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 934
Credit: 6,286,540
RAC: 683
Message 32415 - Posted: 13 Sep 2017, 11:31:47 UTC - in response to Message 32413.  

... do you have anything on the 32bit?


This is what was in my sched_reply_lhcathome.cern.ch_lhcathome.xml:

<app_version>
<app_name>sixtrack</app_name>
<version_num>4630</version_num>
<api_version>7.7.0
API_VERSION
LPAPI_VERSION</api_version>
<file_ref>
<file_name>sixtrack_win32_4630_sse2.exe</file_name>
<main_program/>
</file_ref>
<platform>windows_intelx86</platform>
<plan_class>sse2</plan_class>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>10356896019.129839</flops>
</app_version>
ID: 32415 · Report as offensive     Reply Quote
Juha

Send message
Joined: 22 Mar 17
Posts: 30
Credit: 360,676
RAC: 0
Message 32426 - Posted: 14 Sep 2017, 20:02:32 UTC - in response to Message 32413.  

Sorry to have kept you waiting for an answer.

<api_version> is supposed to contain only API version number:

<api_version>7.7.0</api_version>


So it's wrong in 32-bit app version too.

This is now documented in #2121 and fixed in #2122

since we declared as deprecated the 45107 exes, then only the 4630 should be distributed, and I see this is happening; so I guess that the issue that you raise is less than a concern, isn't it?


It's not a critical issue. At some point someone is going to notice these in log:

12-Sep-2017 02:53:45 [LHC@home] Started download of sixtrack_win64_4630_sse2.exe
13-Sep-2017 03:15:26 [LHC@home] Started download of sixtrack_win64_4630_sse2.exe
13-Sep-2017 14:15:32 [LHC@home] Started download of sixtrack_win64_4630_sse2.exe
14-Sep-2017 02:47:09 [LHC@home] Started download of sixtrack_win64_4630_sse2.exe
14-Sep-2017 15:40:46 [LHC@home] Started download of sixtrack_win64_4630_sse2.exe


And then they wonder what's going on. That's all.
ID: 32426 · Report as offensive     Reply Quote

Message boards : LHC@home Science : Sixtrack apps - Bad "api_version" - BOINC 7.8.2 display grids failing


©2020 CERN