Thread 'New version 263.90'

Author	Message
ritterm Send message Joined: 30 May 08 Posts: 93 Credit: 5,160,246 RAC: 0	Message 37503 - Posted: 3 Dec 2018, 16:59:12 UTC - in response to Message 37498. However, even several hours after you wrote your posting, I got the "no subtasks" error many times... Maybe I'm just lucky or am missing the point, but I have a Theory task that's been running on my 8-core AMD for over 8 hours. If it wasn't getting any jobs, wouldn't it crash, eventually? ID: 37503 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37504 - Posted: 3 Dec 2018, 17:08:11 UTC Why is the tasks bucket again being filled up, if there are still/again not jobs? A few minutes ago, I had another tasks that failed with the "no subtasks" error: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211171810 And again, I don't understand why Theory is not being stopped until all these permanently recurring problems are solved. ID: 37504 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37505 - Posted: 3 Dec 2018, 17:26:22 UTC - in response to Message 37504. A few minutes ago, I had another tasks that failed with the "no subtasks" error: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211171810 and here the next one which failed a minute ago: https://lhcathome.cern.ch/lhcathome/result.php?resultid=211172940 ID: 37505 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2755 Credit: 304,271,457 RAC: 116,232	Message 37507 - Posted: 3 Dec 2018, 17:37:25 UTC ATM it seems to be bare luck to get a subtask for fresh VMs. ID: 37507 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37513 - Posted: 3 Dec 2018, 21:08:08 UTC - in response to Message 37507. ATM it seems to be bare luck to get a subtask for fresh VMs. and this has become standard procedure now? Seems like :-( ID: 37513 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1313 Credit: 97,694,594 RAC: 106,766	Message 37515 - Posted: 4 Dec 2018, 0:46:31 UTC https://lhcathome.cern.ch/lhcathome/results.php?userid=5472 I have well over 100 * Compute error - EXIT_INIT_FAILURE - Condor exited after 728s without running a job* And would have many more if I didn't get home and suspend all of them. ID: 37515 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37516 - Posted: 4 Dec 2018, 5:58:47 UTC here, through all the night all tasks on all my machines failed with the "no subtasks" error. For example: https://lhcathome.cern.ch/lhcathome/results.php?hostid=10452404 https://lhcathome.cern.ch/lhcathome/results.php?hostid=10542973 https://lhcathome.cern.ch/lhcathome/results.php?hostid=10542973 https://lhcathome.cern.ch/lhcathome/results.php?hostid=10544654 this is frustrating and annoying. As stated before (and then by some co-crunchers I was accused of not being respectful enough): either something is going awfully wrong at LHC@home from a technical point, or they simply don't have the experts who would be needed to fix such problems that have occurred for weeks now. We crunchers dedicate our equipment, our time, and our electricity - for nothing :-( ID: 37516 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2304 Credit: 179,727,092 RAC: 17,509	Message 37519 - Posted: 4 Dec 2018, 7:03:23 UTC - in response to Message 37516. Last modified: 4 Dec 2018, 7:12:59 UTC We crunchers dedicate our equipment, our time, and our electricity - for nothing :-( You can do other Boinc-work, if this problem is for some time. Edit: Laurence had in the past a thread with: Respect my limit! We all hope, that they find a solution, but this need TIME. ID: 37519 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37520 - Posted: 4 Dec 2018, 7:58:00 UTC - in response to Message 37519. We all hope, that they find a solution, but this need TIME. and until the solution is found, it would make sense to shut down the Theory subproject. What sense does it make to send out thousands of tasks that error out all the time? ID: 37520 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,248 RAC: 59	Message 37521 - Posted: 4 Dec 2018, 9:45:55 UTC - in response to Message 37520. Last modified: 4 Dec 2018, 10:52:43 UTC In the past the Theory app has been quite stable and according to MCPlots returning ~1K CPU hours. The number of jobs in progress was approximately ~2K and our queue was ~3K leave a 1K job buffer. This morning we had 6976 jobs in progress and so hit our 7K queue limit. This has now been increased to 8K. The issue is MCPlots is still reporting only ~1K CPU hours returned. Looking at the jobs in progress per host, there does not seem to be any hosts acting as a black hole and sucking all the jobs. ID: 37521 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2755 Credit: 304,271,457 RAC: 116,232	Message 37523 - Posted: 4 Dec 2018, 11:05:51 UTC - in response to Message 37521. Some comments. Mainly to see if I understand the process or not. This morning we had 6976 jobs in progress and so hit our 7K queue limit. This has now been increased to 8K. 7 k was the #tasks limit that can be seen here: https://lhcathome.cern.ch/lhcathome/server_status.php It is now 8 k. Until this limit is not reached a BOINC client that requests a task will get one (or more). This tasks will start a VM or increase the client's local buffer. If the limit is reached the client will get a "No tasks available ..." message. The issue is MCPlots is still reporting only ~1K CPU hours returned. VMs that don't process a subtask also don't add CPU hours. Instead they shut down and the client starts a fresh VM. There does not seem to be any hosts acting as a black hole and sucking all the jobs. Not 1 single host, but all active host together. They fight against each other to get the few available subtasks. According to: http://mcplots-dev.cern.ch/production.php?view=status&plots=hourly#plots the # of available subtasks seems to be high enough but I'm curious why the distribution ratio seems to be much too low. I guess that when this ratio rises the #task (from above) will also stabilize on a lower level as the mean runtimes per VM will increase. ID: 37523 · Reply Quote

Harri Liljeroos Send message Joined: 28 Sep 04 Posts: 806 Credit: 66,047,456 RAC: 27,780	Message 37524 - Posted: 4 Dec 2018, 12:09:45 UTC - in response to Message 37523. According to: http://mcplots-dev.cern.ch/production.php?view=status&plots=hourly#plots the # of available subtasks seems to be high enough but I'm curious why the distribution ratio seems to be much too low. I guess that when this ratio rises the #task (from above) will also stabilize on a lower level as the mean runtimes per VM will increase. Could somebody please explain how to read the above MCPLOT graphs. For example what is the "lost ratio" etc.? ID: 37524 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37525 - Posted: 4 Dec 2018, 12:13:15 UTC - in response to Message 37523. There does not seem to be any hosts acting as a black hole and sucking all the jobs. Not 1 single host, but all active host together. They fight against each other to get the few available subtasks. which is bad enough, and rather frustrating ... :-( ID: 37525 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,248 RAC: 59	Message 37526 - Posted: 4 Dec 2018, 12:32:59 UTC - in response to Message 37523. Last modified: 4 Dec 2018, 12:33:16 UTC 7 k was the #tasks limit that can be seen here: https://lhcathome.cern.ch/lhcathome/server_status.php It is now 8 k. I was looking at our other server that is delivering the sub tasks. The numbers are roughly in agreement. Until this limit is not reached a BOINC client that requests a task will get one (or more). This tasks will start a VM or increase the client's local buffer. If the limit is reached the client will get a "No tasks available ..." message. There is a small buffer of tasks but there is no limit. VMs that don't process a subtask also don't add CPU hours. Instead they shut down and the client starts a fresh VM. VMs that don't process a subtask also don't get a subtask and hence there would be a difference between the number of tasks and subtasks. Not 1 single host, but all active host together. They fight against each other to get the few available subtasks. No, there are currently 7.5K subtasks 'running' ID: 37526 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2755 Credit: 304,271,457 RAC: 116,232	Message 37527 - Posted: 4 Dec 2018, 13:05:46 UTC - in response to Message 37526. No limit? So, what is meant by the "limit" here? Laurence wrote: This morning we had 6976 jobs in progress and so hit our 7K queue limit. This has now been increased to 8K. now: Laurence wrote: There is a small buffer of tasks but there is no limit. Laurence wrote: I was looking at our other server that is delivering the sub tasks. The numbers are roughly in agreement. Sorry to ask again. Are this the numbers mentioned as "#tasks in progress" at the server status page? Laurence wrote: hence there would be a difference between the number of tasks and subtasks. ... No, there are currently 7.5K subtasks 'running' Shouldn't there be a significant difference? What about tasks that are sent out but remain unstarted in the client work buffers. I would expect the #subtasks smaller than the #tasks. Or at least different as it is a multicore app. ID: 37527 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2755 Credit: 304,271,457 RAC: 116,232	Message 37528 - Posted: 4 Dec 2018, 13:14:33 UTC - in response to Message 37525. There does not seem to be any hosts acting as a black hole and sucking all the jobs. Not 1 single host, but all active host together. They fight against each other to get the few available subtasks. which is bad enough, and rather frustrating ... :-( My comments were thought to be questions rather than statements. Be so kind as to understand them that way, not as "truth". ID: 37528 · Reply Quote

Laurence Project administrator Project developer Send message Joined: 20 Jun 14 Posts: 431 Credit: 256,248 RAC: 59	Message 37529 - Posted: 4 Dec 2018, 13:19:44 UTC - in response to Message 37527. No limit? So, what is meant by the "limit" here? There is a limit on the number of subtasks but not the number of tasks. Sorry to ask again. Are this the numbers mentioned as "#tasks in progress" at the server status page? The number of tasks is on the status page, the number of subtasks is only visible internally. Shouldn't there be a significant difference? Yes, if there were lots of VMs trying to get subtasks but that is not what we see. What about tasks that are sent out but remain unstarted in the client work buffers. I would expect the #subtasks smaller than the #tasks. Or at least different as it is a multicore app. There were 4K subtasks running older than 48hours. I suspect most have been suspended or disconnected. These 4K are being counted as part of the queue. I am looking at ways to handle this situation. ID: 37529 · Reply Quote

peterfilla Send message Joined: 2 Jan 11 Posts: 23 Credit: 5,986,899 RAC: 0	Message 37531 - Posted: 4 Dec 2018, 17:22:44 UTC I upgraded VBox to Vers. 5.2.22 - and now my JOBs are running . . . ist this the solution ??!! ID: 37531 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1986 Credit: 162,172,797 RAC: 87,016	Message 37532 - Posted: 4 Dec 2018, 17:30:49 UTC - in response to Message 37531. I upgraded VBox to Vers. 5.2.22 - and now my JOBs are running . . . ist this the solution ??!! I think this was rather coincidence :-) ID: 37532 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2755 Credit: 304,271,457 RAC: 116,232	Message 37533 - Posted: 4 Dec 2018, 17:38:13 UTC - in response to Message 37531. I upgraded VBox to Vers. 5.2.22 - and now my JOBs are running . . . ist this the solution ??!! No. It's not a client side issue. The project server simply can't satisfy the demand for subtasks. Sounds easy but in detail it seems to be rather complex to find the right settings. ID: 37533 · Reply Quote