Message boards :
Theory Application :
New version 263.90
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 30 May 08 Posts: 93 Credit: 5,160,246 RAC: 0 |
However, even several hours after you wrote your posting, I got the "no subtasks" error many times... Maybe I'm just lucky or am missing the point, but I have a Theory task that's been running on my 8-core AMD for over 8 hours. If it wasn't getting any jobs, wouldn't it crash, eventually? |
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,483,743 RAC: 104,419 |
Why is the tasks bucket again being filled up, if there are still/again not jobs? A few minutes ago, I had another tasks that failed with the "no subtasks" error: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211171810 And again, I don't understand why Theory is not being stopped until all these permanently recurring problems are solved. |
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,483,743 RAC: 104,419 |
A few minutes ago, I had another tasks that failed with the "no subtasks" error: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211171810and here the next one which failed a minute ago: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211172940 |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 223,040,502 RAC: 136,849 |
ATM it seems to be bare luck to get a subtask for fresh VMs. |
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,483,743 RAC: 104,419 |
ATM it seems to be bare luck to get a subtask for fresh VMs.and this has become standard procedure now? Seems like :-( |
Send message Joined: 24 Oct 04 Posts: 1114 Credit: 49,504,188 RAC: 3,842 |
https://lhcathome.cern.ch/lhcathome/results.php?userid=5472 I have well over 100 * Compute error - EXIT_INIT_FAILURE - Condor exited after 728s without running a job* And would have many more if I didn't get home and suspend all of them. |
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,483,743 RAC: 104,419 |
here, through all the night all tasks on all my machines failed with the "no subtasks" error. For example: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10452404 https://lhcathome.cern.ch/lhcathome/results.php?hostid=10542973 https://lhcathome.cern.ch/lhcathome/results.php?hostid=10542973 https://lhcathome.cern.ch/lhcathome/results.php?hostid=10544654 this is frustrating and annoying. As stated before (and then by some co-crunchers I was accused of not being respectful enough): either something is going awfully wrong at LHC@home from a technical point, or they simply don't have the experts who would be needed to fix such problems that have occurred for weeks now. We crunchers dedicate our equipment, our time, and our electricity - for nothing :-( |
Send message Joined: 2 May 07 Posts: 2071 Credit: 156,192,791 RAC: 103,819 |
We crunchers dedicate our equipment, our time, and our electricity - for nothing :-( You can do other Boinc-work, if this problem is for some time. Edit: Laurence had in the past a thread with: Respect my limit! We all hope, that they find a solution, but this need TIME. |
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,483,743 RAC: 104,419 |
We all hope, that they find a solution, but this need TIME.and until the solution is found, it would make sense to shut down the Theory subproject. What sense does it make to send out thousands of tasks that error out all the time? |
Send message Joined: 20 Jun 14 Posts: 372 Credit: 238,712 RAC: 0 |
In the past the Theory app has been quite stable and according to MCPlots returning ~1K CPU hours. The number of jobs in progress was approximately ~2K and our queue was ~3K leave a 1K job buffer. This morning we had 6976 jobs in progress and so hit our 7K queue limit. This has now been increased to 8K. The issue is MCPlots is still reporting only ~1K CPU hours returned. Looking at the jobs in progress per host, there does not seem to be any hosts acting as a black hole and sucking all the jobs. |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 223,040,502 RAC: 136,849 |
Some comments. Mainly to see if I understand the process or not. This morning we had 6976 jobs in progress and so hit our 7K queue limit. This has now been increased to 8K. 7 k was the #tasks limit that can be seen here: https://lhcathome.cern.ch/lhcathome/server_status.php It is now 8 k. Until this limit is not reached a BOINC client that requests a task will get one (or more). This tasks will start a VM or increase the client's local buffer. If the limit is reached the client will get a "No tasks available ..." message. The issue is MCPlots is still reporting only ~1K CPU hours returned. VMs that don't process a subtask also don't add CPU hours. Instead they shut down and the client starts a fresh VM. There does not seem to be any hosts acting as a black hole and sucking all the jobs. Not 1 single host, but all active host together. They fight against each other to get the few available subtasks. According to: http://mcplots-dev.cern.ch/production.php?view=status&plots=hourly#plots the # of available subtasks seems to be high enough but I'm curious why the distribution ratio seems to be much too low. I guess that when this ratio rises the #task (from above) will also stabilize on a lower level as the mean runtimes per VM will increase. |
Send message Joined: 28 Sep 04 Posts: 674 Credit: 43,168,451 RAC: 16,096 |
Could somebody please explain how to read the above MCPLOT graphs. For example what is the "lost ratio" etc.? |
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,483,743 RAC: 104,419 |
which is bad enough, and rather frustrating ... :-( |
Send message Joined: 20 Jun 14 Posts: 372 Credit: 238,712 RAC: 0 |
I was looking at our other server that is delivering the sub tasks. The numbers are roughly in agreement.
There is a small buffer of tasks but there is no limit.
VMs that don't process a subtask also don't get a subtask and hence there would be a difference between the number of tasks and subtasks.
No, there are currently 7.5K subtasks 'running' |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 223,040,502 RAC: 136,849 |
No limit? So, what is meant by the "limit" here? Laurence wrote: This morning we had 6976 jobs in progress and so hit our 7K queue limit. This has now been increased to 8K. now: Laurence wrote: There is a small buffer of tasks but there is no limit. Laurence wrote: I was looking at our other server that is delivering the sub tasks. The numbers are roughly in agreement. Sorry to ask again. Are this the numbers mentioned as "#tasks in progress" at the server status page? Laurence wrote: hence there would be a difference between the number of tasks and subtasks. Shouldn't there be a significant difference? What about tasks that are sent out but remain unstarted in the client work buffers. I would expect the #subtasks smaller than the #tasks. Or at least different as it is a multicore app. |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 223,040,502 RAC: 136,849 |
There does not seem to be any hosts acting as a black hole and sucking all the jobs. My comments were thought to be questions rather than statements. Be so kind as to understand them that way, not as "truth". |
Send message Joined: 20 Jun 14 Posts: 372 Credit: 238,712 RAC: 0 |
No limit? There is a limit on the number of subtasks but not the number of tasks.
The number of tasks is on the status page, the number of subtasks is only visible internally.
Yes, if there were lots of VMs trying to get subtasks but that is not what we see.
There were 4K subtasks running older than 48hours. I suspect most have been suspended or disconnected. These 4K are being counted as part of the queue. I am looking at ways to handle this situation. |
Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0 |
I upgraded VBox to Vers. 5.2.22 - and now my JOBs are running . . . ist this the solution ??!! |
Send message Joined: 18 Dec 15 Posts: 1686 Credit: 100,483,743 RAC: 104,419 |
I upgraded VBox to Vers. 5.2.22 - and now my JOBs are running . . . ist this the solution ??!!I think this was rather coincidence :-) |
Send message Joined: 15 Jun 08 Posts: 2386 Credit: 223,040,502 RAC: 136,849 |
I upgraded VBox to Vers. 5.2.22 - and now my JOBs are running . . . ist this the solution ??!! No. It's not a client side issue. The project server simply can't satisfy the demand for subtasks. Sounds easy but in detail it seems to be rather complex to find the right settings. |
©2024 CERN