Message boards : Number crunching : don't get ATLAS work despite lots of unsent tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
Curly

Send message
Joined: 3 Jan 07
Posts: 2
Credit: 4,129,541
RAC: 0
Message 36029 - Posted: 24 Jul 2018, 12:33:21 UTC
Last modified: 24 Jul 2018, 12:35:38 UTC

Starting today my host get ATLAS tasks only occasionally.

Almost all requests return
24.07.2018 14:23:04 | LHC@home | No tasks are available for ATLAS Simulation

Server status page shows 7648 tasks.

Any reason/idea what's wrong?

Edit: Running Boinc 7.8.2 and VirtualBox 5.2.16 on Windows.
ID: 36029 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 634
Credit: 3,876,556
RAC: 1,144
Message 36031 - Posted: 24 Jul 2018, 16:25:28 UTC - in response to Message 36029.  
Last modified: 24 Jul 2018, 16:26:25 UTC

It seems to me that your results do not include a HITS file. This happens to also on a Windows 10 PC with plenty of RAM (22 GB). A Linux host with only 8 GB of RAM produces HITS files.
Tullio
ID: 36031 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1322
Credit: 24,369,852
RAC: 10,204
Message 36032 - Posted: 24 Jul 2018, 18:59:09 UTC - in response to Message 36031.  

It seems to me that your results do not include a HITS file.
the stderr seems to be misleading.
In the past few days, there has been a discussion about this problem in this thread in the ATLAS section of the forum:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4763.
This also describes a way how to more reliably find out about the HITS file (via ATLAS PanDA).
ID: 36032 · Report as offensive     Reply Quote
Profile MAGIC Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 977
Credit: 41,716,022
RAC: 19,626
Message 36033 - Posted: 24 Jul 2018, 20:05:18 UTC
Last modified: 24 Jul 2018, 20:08:39 UTC

Windows and Linux OS always have a different stderr on all these Cern projects.

Edit: btw I noticed by testing that the older Boinc version is having problems but the newest version is working perfect (as far as Windows OS anyway)
ID: 36033 · Report as offensive     Reply Quote
bronco

Send message
Joined: 13 Apr 18
Posts: 443
Credit: 8,438,885
RAC: 0
Message 36034 - Posted: 24 Jul 2018, 21:04:41 UTC - in response to Message 36029.  

Starting today my host get ATLAS tasks only occasionally.
Any reason/idea what's wrong?

The server isn't broken it's just overloaded? Because too many volunteers are overloading their hosts which causes them to return results without HITS files? Such tasks run for about 30 minutes when they should normally run for about 2 hours? So the server is caught in a vicious cycle of sending tasks to hosts that blow the task, ask for more, blow those tasks too and ask for more and on and on and on? Not a problem when task downloads are small but remember every ATLAS task requires a 300MB download. Bandwidth is not infinite.

Then why has this problem not surfaced before? Maybe the latest batch of ATLAS tasks are more demanding than previous batches and have a higher failure rate?

If this theory is correct then it would seem the easiest way to fix it would be to NOT mark tasks as valid when they don't return a HITS file. That way volunteers would know something is wrong.

Or impose a limit on the number of consecutive "no HITters" (results without a HITS file) and if a host exceeds that limit the server refuses requests for more ATLAS tasks and sends an email or a PM to the owner informing him of the problem.

Seems to me the problem with very slow ATLAS downloads mentioned several weeks ago and now this new problem are all part of the same problem which started out small and is getting progressively worse.
ID: 36034 · Report as offensive     Reply Quote
tullio

Send message
Joined: 19 Feb 08
Posts: 634
Credit: 3,876,556
RAC: 1,144
Message 36036 - Posted: 25 Jul 2018, 4:30:32 UTC

My Atlas tasks on a Windows 10 PC take about 700 s and produce no HITS file. On my two Linux boxes they take about 150k s and produce a HITS file.
Tullio
ID: 36036 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1608
Credit: 94,642,557
RAC: 98,659
Message 36037 - Posted: 25 Jul 2018, 4:47:22 UTC - in response to Message 36036.  

The well known "error 65".
That WUs fail on the scientific level.
It has been discussed many times.
ID: 36037 · Report as offensive     Reply Quote

Message boards : Number crunching : don't get ATLAS work despite lots of unsent tasks


©2021 CERN