log in

100% errors


Advanced search

Message boards : ATLAS application : 100% errors

Author Message
Profile den777
Send message
Joined: 19 Oct 13
Posts: 6
Credit: 675,849
RAC: 506
Message 29940 - Posted: 16 Apr 2017, 11:08:01 UTC
Last modified: 16 Apr 2017, 11:15:46 UTC

Why all tasks finish with errors?
I checked everything, I even increased memory for VM for 2CPUs up to 4800Mb, but still cannot get any successfull task.
https://lhcathome.cern.ch/lhcathome/results.php?userid=269482
EVERY task is either computation error or validation error, not only for me but for wingmen too. Is this subproject really working?



But wait, miracle happened, I got 1 (one) successfull task. Only one task between 20. Is this rate ok for this project?

This is my app_config.xml
<app_config>
<app>
<name>ATLAS</name>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>ATLAS</app_name>
<avg_ncpus>2.000000</avg_ncpus>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<cmdline>--memory_size_mb 4800</cmdline>
</app_version>
</app_config>

Dave Peachey
Send message
Joined: 9 May 09
Posts: 17
Credit: 752,075
RAC: 0
Message 29941 - Posted: 16 Apr 2017, 13:05:43 UTC
Last modified: 16 Apr 2017, 13:20:13 UTC

All of your tasks showing Error while computing seem to be victims of the problem being discussed, at length, in the thread on the "Error -161" message (per https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4224) which is a recent and recurrent problem.

Whilst this seems to affect people to a greater or lesser extent, there are a number of people who are experiencing this problem with a significant proportion of WUs they download (as would seem to be the case for you). In the absence of (apparent) investigations or any reports by the project team (I presume they are short-staffed due to the holidays hence the lack of news), the jury is out on where the problem lies.

I can't speak for the ones showing Validate error but, certainly with your attempts to process them, the result includes a failure to generate the 50MB+ HITS output file ... as would seem to be the case for the wingmen who are also failing with those same WUs. So there's something different that's wrong here although, as those WUs mostly seem to validate eventually, that would seem to imply a different fault.

However, this seems to be the first time (which I can recall) since the recent consolidation of ATLAS@home into the wider LHC@home that problems of this order of magnitude have been encountered so I would hope this is a glitch and not a foreshadowing of future user experiences for ATLAS@home.

And, no, a 5% success rate on successful WU processing is very much not OK for this or any other project (a 5% failure rate is generally deemed to be just about acceptable) and is indicative of problems at either the project end, at your end ... or both!

BTW, your app_config file looks OK; mine is simlar at
<app>
<name>ATLAS</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>ATLAS</app_name>
<avg_ncpus>4.000000</avg_ncpus>
<plan_class>vbox64_mt_mcore_atlas</plan_class>
<cmdline>--memory_size_mb 5700</cmdline>
</app_version>

but this is for a sixteen core machine using only four cores for ATLAS and with 32GB RAM.

Given your machine only seems to have two cores and 6GB of RAM (if I've read the specs page for that computer correctly), maybe it's underpowered for running ATLAS on two cores and you should consider cutting back to using only one core with a commensurate drop in the RAM setting.

Profile den777
Send message
Joined: 19 Oct 13
Posts: 6
Credit: 675,849
RAC: 506
Message 29942 - Posted: 16 Apr 2017, 14:21:36 UTC - in response to Message 29941.

Given your machine only seems to have two cores and 6GB of RAM (if I've read the specs page for that computer correctly), maybe it's underpowered for running ATLAS on two cores and you should consider cutting back to using only one core with a commensurate drop in the RAM setting.

This is the main problem, I think.
2 cores VB machine has 4200 Mb memory limit by default, my computer has 6 Gb. What wrong with this? And what is the real memory requirement for such tasks? If 4.2Gb is not enough for 2 cores, why virtual machines still misconfigured? If 6Gb of total memory is not enough for 2 cores VB machine, why this tasks is still being sent?
It's an old issue, since atlas@home were a separate project. But nobody's care.

I changed app_config.xml to 1 avg_ncpus, so let's see how it works.

Erich56
Send message
Joined: 18 Dec 15
Posts: 383
Credit: 3,873,774
RAC: 7,567
Message 29944 - Posted: 16 Apr 2017, 17:04:53 UTC - in response to Message 29941.

Whilst this seems to affect people to a greater or lesser extent, there are a number of people who are experiencing this problem with a significant proportion of WUs they download (as would seem to be the case for you).

Same is true for me.

Meanwhile, I have noticed that all "Long runners" (which, when being downloaded, show a remaining time of a fews days) error out shortly after start. The other tasks (showing a remaining time of a few hours) are all going well.

So what I am doing now is: once such a "Long runner" is downloaded, I abort it immediately.

Profile HerveUAE
Avatar
Send message
Joined: 18 Dec 16
Posts: 120
Credit: 6,749,027
RAC: 20,218
Message 29949 - Posted: 17 Apr 2017, 14:32:58 UTC

So what I am doing now is: once such a "Long runner" is downloaded, I abort it immediately.

Same here, and all short runners are ok.
____________
We are the product of random evolution.

Profile den777
Send message
Joined: 19 Oct 13
Posts: 6
Credit: 675,849
RAC: 506
Message 29957 - Posted: 18 Apr 2017, 13:55:19 UTC
Last modified: 18 Apr 2017, 13:56:08 UTC

Ok, with 1 cpu it works fine.
"Bad tasks" fail by itself after some minutes and "good tasks" successfully finish and get validated.

4200-4800Mb of memory is not enough for 2 cores (or 6Gb is not enough for virtualbox to run such virtual machines, I'm not sure)
4200Mb is enough for single-core core VMs :)

Erich56
Send message
Joined: 18 Dec 15
Posts: 383
Credit: 3,873,774
RAC: 7,567
Message 29959 - Posted: 18 Apr 2017, 15:52:12 UTC - in response to Message 29957.

Ok, with 1 cpu it works fine.
"Bad tasks" fail by itself after some minutes and "good tasks" successfully finish and get validated.

4200-4800Mb of memory is not enough for 2 cores (or 6Gb is not enough for virtualbox to run such virtual machines, I'm not sure)
4200Mb is enough for single-core core VMs :)

You think the problem of the failing long-runners has to do with the memory size?
Well, I have 5000MB in the app_config for 2 cores. And still they fail.

Profile HerveUAE
Avatar
Send message
Joined: 18 Dec 16
Posts: 120
Credit: 6,749,027
RAC: 20,218
Message 29967 - Posted: 19 Apr 2017, 2:45:18 UTC - in response to Message 29959.

You think the problem of the failing long-runners has to do with the memory size?
Well, I have 5000MB in the app_config for 2 cores. And still they fail.

I have tested 2-cores with 4600MB and 1 with 5000MB in app_config. The long-runners fail in both cases and the short-runners occasionally fail with 4600MB, but very few.
____________
We are the product of random evolution.

Profile den777
Send message
Joined: 19 Oct 13
Posts: 6
Credit: 675,849
RAC: 506
Message 29968 - Posted: 19 Apr 2017, 5:07:41 UTC - in response to Message 29959.


You think the problem of the failing long-runners has to do with the memory size?

Nope, I didn't say that.

Message boards : ATLAS application : 100% errors