Message boards : Number crunching : EXIT_INIT_FAILURE troubleshooting?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29211 - Posted: 12 Mar 2017, 19:47:00 UTC

I have some behaviour that I do not understand, and maybe someone can help me.

I have many of the CMS, Theory and LHCb tasks finishing as "Error while computing" with exit status "206 (0x000000CE) EXIT_INIT_FAILURE". I understand that this is a normal behaviour because the task did not manage to get any actual work from the LHC@Home server.

However on one of my machines, 1 Theory task out of 15 manages to get some work. (I did not try CMS and LHCb because those are a bit more RAM hungry).
Theory tasks for that machine: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10420599&offset=0&show_names=0&state=0&appid=13

And on the other machine, none gets work over more than 40 tasks.
Tasks for that other machine: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10416269

I expected to see the same success rate for both machines, but that is quite obviously not the case. Since the "Stderr output" of those tasks is empty, I have no idea where the difference could come from.

Any suggestion anyone?
We are the product of random evolution.
ID: 29211 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,569,815
RAC: 9,173
Message 29228 - Posted: 13 Mar 2017, 15:19:39 UTC

HM I just checked one of your failed WUs and it looks like something is really going wrong during startup.

As this seem to happen randomly, are you shure, that VirtualBox is the only Virtualisation-System on your PC ? Or could it be that Hyper-V (or another different Virtualisation-Software) is still on your system and sometimes blocking / using the VT-X-Function of the prozessor ?


Supporting BOINC, a great concept !
ID: 29228 · Report as offensive     Reply Quote
Profile Yeti
Volunteer moderator
Avatar

Send message
Joined: 2 Sep 04
Posts: 453
Credit: 193,569,815
RAC: 9,173
Message 29229 - Posted: 13 Mar 2017, 15:22:08 UTC

Or did you upate your BIOS ? This often de-activates Virtualisation-Settings of the prozessor


Supporting BOINC, a great concept !
ID: 29229 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29236 - Posted: 14 Mar 2017, 4:01:02 UTC

Thanks Yeti for your suggestions.
HM I just checked one of your failed WUs and it looks like something is really going wrong during startup.

You are probably making reference to LHCb tasks that failed with Exit Status STATUS_STACK_BUFFER_OVERRUN like this one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=125539757
I had a few of those yesterday, each with a VirtualBox pop-up error that was blocking the VM itself. I am not too sure what caused those errors.
But the problem I usually have is for tasks that end up with EXIT_INIT_FAILURE like this one: https://lhcathome.cern.ch/lhcathome/result.php?resultid=125539846
Those run 2 or 3 minutes only and exit.

are you shure, that VirtualBox is the only Virtualisation-System on your PC ? Or could it be that Hyper-V (or another different Virtualisation-Software) is still on your system and sometimes blocking / using the VT-X-Function of the prozessor ?

Yes, quite sure. I have recently updated to VirtualBox 5.1.16, but before and after the update, I had no issues running ATLAS tasks.

Or did you upate your BIOS ? This often de-activates Virtualisation-Settings of the prozessor

Nope. Up till Friday, ATLAS jobs were still running fine, while at the same time I had those EXIT_INIT_FAILURE for Theory, LHCb and CMS.
We are the product of random evolution.
ID: 29236 · Report as offensive     Reply Quote
Profile HerveUAE
Avatar

Send message
Joined: 18 Dec 16
Posts: 123
Credit: 37,495,365
RAC: 0
Message 29237 - Posted: 14 Mar 2017, 4:13:42 UTC

From some random check I did, most people do not have this problem. Also, when I first joined LHC@Home, CMS, LHCb and Theory were running OK.
The main change is that I re-installed the graphic card driver to support OpenCL and run Einstein@Home on GPU. Could that explain my problem?
We are the product of random evolution.
ID: 29237 · Report as offensive     Reply Quote

Message boards : Number crunching : EXIT_INIT_FAILURE troubleshooting?


©2024 CERN