Message boards :
Number crunching :
don't get ATLAS work despite lots of unsent tasks
Message board moderation
Author | Message |
---|---|
Send message Joined: 3 Jan 07 Posts: 2 Credit: 4,129,669 RAC: 0 |
Starting today my host get ATLAS tasks only occasionally. Almost all requests return 24.07.2018 14:23:04 | LHC@home | No tasks are available for ATLAS Simulation Server status page shows 7648 tasks. Any reason/idea what's wrong? Edit: Running Boinc 7.8.2 and VirtualBox 5.2.16 on Windows. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
It seems to me that your results do not include a HITS file. This happens to also on a Windows 10 PC with plenty of RAM (22 GB). A Linux host with only 8 GB of RAM produces HITS files. Tullio |
Send message Joined: 18 Dec 15 Posts: 1688 Credit: 103,129,915 RAC: 120,329 |
It seems to me that your results do not include a HITS file.the stderr seems to be misleading. In the past few days, there has been a discussion about this problem in this thread in the ATLAS section of the forum: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4763. This also describes a way how to more reliably find out about the HITS file (via ATLAS PanDA). |
Send message Joined: 24 Oct 04 Posts: 1118 Credit: 49,728,983 RAC: 13,228 |
Windows and Linux OS always have a different stderr on all these Cern projects. Edit: btw I noticed by testing that the older Boinc version is having problems but the newest version is working perfect (as far as Windows OS anyway) |
Send message Joined: 13 Apr 18 Posts: 443 Credit: 8,438,885 RAC: 0 |
Starting today my host get ATLAS tasks only occasionally. The server isn't broken it's just overloaded? Because too many volunteers are overloading their hosts which causes them to return results without HITS files? Such tasks run for about 30 minutes when they should normally run for about 2 hours? So the server is caught in a vicious cycle of sending tasks to hosts that blow the task, ask for more, blow those tasks too and ask for more and on and on and on? Not a problem when task downloads are small but remember every ATLAS task requires a 300MB download. Bandwidth is not infinite. Then why has this problem not surfaced before? Maybe the latest batch of ATLAS tasks are more demanding than previous batches and have a higher failure rate? If this theory is correct then it would seem the easiest way to fix it would be to NOT mark tasks as valid when they don't return a HITS file. That way volunteers would know something is wrong. Or impose a limit on the number of consecutive "no HITters" (results without a HITS file) and if a host exceeds that limit the server refuses requests for more ATLAS tasks and sends an email or a PM to the owner informing him of the problem. Seems to me the problem with very slow ATLAS downloads mentioned several weeks ago and now this new problem are all part of the same problem which started out small and is getting progressively worse. |
Send message Joined: 19 Feb 08 Posts: 708 Credit: 4,336,250 RAC: 0 |
My Atlas tasks on a Windows 10 PC take about 700 s and produce no HITS file. On my two Linux boxes they take about 150k s and produce a HITS file. Tullio |
Send message Joined: 15 Jun 08 Posts: 2401 Credit: 225,575,887 RAC: 120,945 |
The well known "error 65". That WUs fail on the scientific level. It has been discussed many times. |
©2024 CERN