Message boards : CMS Application : Boinc getting exit status wrong: thinks some tasks end ok: only 1 was good
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Joseph Stateson

Send message
Joined: 10 Aug 08
Posts: 15
Credit: 741,917
RAC: 0
Message 41825 - Posted: 5 Mar 2020, 17:26:36 UTC

Except for 1 CMS tasks, all finished with computation error as shown here

Unaccountably, boinc (using the BoincTask's history file) thinks five CMS tasks finished ok rather than just one. Possibly BM or maybe BT is not recognizing the exit code or state reported by the Boinc client. The image below shows, in red, the failing tasks with the green ones supposedly correct. The only actually good task was CMS_2758607_1583327645_654309_0 which had and exit code of "0" and state of 31.
Note that it ran for 14 hours, about right while others were far less, some only 20 minutes.
The codes and states are part of the boinc client and not the same same (I am guesing) as reported by the app and shown on the project web site.

Is the source code available where I can track down the exit code to see what the client expects? An exit code of 65536 is actually "0" if the client expects a short integer. Same for 13565952. However, some of the 13565952 are marked as bad which is correct so the problem is not just the wrong return type.
Name of LHC app  (all CMS)      State  Exit code    	Hex
CMS_2758607_1583327645_654309_0	31	       0	     0
CMS_3283891_1583379179_229515_0	31	   65536	 10000
CMS_3429646_1583393596_983118_0	 3	13565552	FC0000
CMS_3544230_1583405013_479488_0	 3	13565552	FC0000
CMS_3556244_1583406215_299149_0	 3	13565552	FC0000
CMS_3540875_1583404712_894231_0 31	13565552	FC0000
CMS_3578506_1583408318_580860_0  3	13565552	FC0000
CMS_3556238_1583406215_252558_0 31	13565552	FC0000
CMS_3597441_1583410122_414318_0  3	13565552	FC0000
CMS_3565130_1583407116_164734_0 31	13565552	FC0000
CMS_3623454_1583412526_246354_0  3	13565552	FC0000
CMS_3620757_1583412225_441454_0  3	13565552	FC0000
CMS_3658161_1583416134_728359_0  3	13565552	FC0000
CMS_3676965_1583417951_770383_0  3	13565552	FC0000
CMS_3689514_1583419158_910138_0  3	13565552	FC0000
CMS_3692705_1583419459_290792_0  3	13565552 	FC0000
CMS 3687301 1583418858_348740 0 31	13565552	FC0000




raw data is here
https://stateson.net/images/all_lhc_cms.txt
ID: 41825 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,962,415
RAC: 136,861
Message 41826 - Posted: 5 Mar 2020, 20:16:45 UTC - in response to Message 41825.  

See this post:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5309&postid=41635
It's better not to run CMS until Ivan posts a "GO".
ID: 41826 · Report as offensive     Reply Quote

Message boards : CMS Application : Boinc getting exit status wrong: thinks some tasks end ok: only 1 was good


©2024 CERN