Message boards :
News :
Task creation delayed - database maintenance
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
Due to a database issue last week, task generation is delayed and we need to clean up stuck workunits. The project daemons will be on and off this morning while we try to debug a problem with the BOINC transitioner. |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,914,584 RAC: 128,221 |
Hi Nils, It's not very polite to kick away running tasks, especially those that have runtimes close to or slightly above 12:00 h. :-(( |
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
Sorry, if any running tasks have been removed, this is by accident and we apologize for that. We have only removed excess Theory, LHCb and CMS workunits that had been created last week that currently slow down the transitioner. If there are issues with ATLAS tasks, this is being looked at, please check the ATLAS application sub-forum. |
Send message Joined: 26 Jul 12 Posts: 18 Credit: 2,456,826 RAC: 0 |
Well, if I'm slowing down the transitioner by running long term Theory tasks - probably this a time for me to switch to some other project. My computer's idle time became my computer's waste time. |
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
The tasks that have already been sent out that you guys crunch do not slow down our processes. We have some problems after a lost database connection last week and an internal buffer overflow that we are now cleaning up. Hence the freshly generated tasks do not make it to the scheduler yet. There should be tasks again once the BOINC transitioner has gone through the backlog. |
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
We have been running the BOINC transitioner catchup script during the night, but still suffer from backlog and will have another interruption this afternoon. We will also try to re-validate the ATLAS tasks that were lost (ref. this ATLAS application thread) Other results should also validate once our servers are back in order. |
Send message Joined: 18 Dec 15 Posts: 1785 Credit: 117,278,447 RAC: 71,589 |
We have been running the BOINC transitioner catchup script during the night, but still suffer from backlog and will have another interruption this afternoon.Nils - any news? |
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
Sorry, this drags on. The transitioner backlog is still too large to ship out new work. We hope to be at the end soon, and also to get more memory on our DB servers. |
Send message Joined: 18 Dec 15 Posts: 1785 Credit: 117,278,447 RAC: 71,589 |
The transitioner backlog is still too large to ship out new work.thanks for the information. What makes we wonder though is how the transitioner backlog could increase that much within the past 2 days (it's 42.21 hours right now) while now new tasks at all were sent out during this period. How come? |
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
It still needs to process tasks from the weekend. We are running several instances of it now on other servers (that's why it is not running on our main server). |
Send message Joined: 28 Sep 04 Posts: 722 Credit: 48,342,058 RAC: 29,814 |
A bunch of resend sixtrack tasks just downloaded to my hosts (over 100 tasks which were timed out on other hosts). |
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
Our database server got more memory and we are back in normal operation since yesterday evening. There should be more tasks during the day. |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,555,611 RAC: 2,687 |
.... we are back in normal operation since yesterday evening.... Thanks, Nils... but.. I've still got 25 tasks waiting for validation but the database "can't find workunit" Have these been lost? From a prevoius post "Sorry, if any running tasks have been removed, this is by accident and we apologize for that." |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,555,611 RAC: 2,687 |
.... we are back in normal operation since yesterday evening.... This represents a bit over 146 hours running time and I can't be the only one. I hope that more care will be taken in future to preserve volunteers' work. Not best pleased. |
Send message Joined: 15 Jul 05 Posts: 247 Credit: 5,974,599 RAC: 0 |
Indeed sorry again that some workunits that had been sent have been removed by mistake. While our validators have been able to re-validate a majority of tasks, validation of the results for the lost VM application workunits could not be handled by standard validation process, so we have tried to give them all credit based on CPU time. Thanks again to you all for your crunching and valuable contributions to LHC@home. |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,914,584 RAC: 128,221 |
While our validators have been able to re-validate a majority of tasks ... Hi Nils, are you sure that process is finished? The following tasks are still shown as "validation pending". There may be an inconsistency as the link to their WU leads to a "non existent WU". https://lhcathome.cern.ch/lhcathome/result.php?resultid=176705219 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176705221 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176705117 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176705124 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176722219 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176720223 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176720275 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176720277 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176705473 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176705364 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176705365 The following task wasn't late although the webpage states it. At least the DB seems to be in a consistent state. https://lhcathome.cern.ch/lhcathome/result.php?resultid=174142645 |
Send message Joined: 26 Jan 18 Posts: 2 Credit: 1,066,678 RAC: 0 |
I have the same issue with my task and I can get no new WU since a week approximately. Event log says no tasks sent (it says it could send ATI/AMD tasks but I have an Intel/Nvidia Setup). So I'm not downloading any new work units (I also have virtualbox). |
Send message Joined: 15 Jun 08 Posts: 2520 Credit: 251,914,584 RAC: 128,221 |
I have the same issue with my task and I can get no new WU since a week approximately. SixTrack has no task at the moment (and ATLAS only very few). See: https://lhcathome.cern.ch/lhcathome/server_status.php Your vbox tasks fail because of "ERR_CPU_VM_EXTENSIONS_DISABLED". You may work through Yeti's checklist, especially No 4 and No 5. |
Send message Joined: 6 Sep 08 Posts: 118 Credit: 12,555,611 RAC: 2,687 |
While our validators have been able to re-validate a majority of tasks ... https://lhcathome.cern.ch/lhcathome/result.php?resultid=176729959 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176732019 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176766341 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176728806 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176732966 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176732292 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176731924 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176732452 https://lhcathome.cern.ch/lhcathome/result.php?resultid=176733077 querying the database produces "can't find workunit" |
Send message Joined: 26 Jan 18 Posts: 2 Credit: 1,066,678 RAC: 0 |
Your vbox tasks fail because of "ERR_CPU_VM_EXTENSIONS_DISABLED". Thanks A LOT ! I enabled VT-X in Bios, did the other fixes (in the xml file) and vbox64 is now running again (I updated it and installed as well the extension pack). Thanks again. |
©2024 CERN