Message boards : News : Task creation delayed - database maintenance
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 172
Credit: 2,690,842
RAC: 7,067
Message 34282 - Posted: 5 Feb 2018, 8:16:44 UTC

Due to a database issue last week, task generation is delayed and we need to clean up stuck workunits. The project daemons will be on and off this morning while we try to debug a problem with the BOINC transitioner.
ID: 34282 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 768
Credit: 9,534,733
RAC: 29,955
Message 34286 - Posted: 5 Feb 2018, 9:10:54 UTC - in response to Message 34282.  

Hi Nils,

It's not very polite to kick away running tasks, especially those that have runtimes close to or slightly above 12:00 h.
:-((
ID: 34286 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 172
Credit: 2,690,842
RAC: 7,067
Message 34287 - Posted: 5 Feb 2018, 10:05:57 UTC - in response to Message 34286.  

Sorry, if any running tasks have been removed, this is by accident and we apologize for that.

We have only removed excess Theory, LHCb and CMS workunits that had been created last week that currently slow down the transitioner.

If there are issues with ATLAS tasks, this is being looked at, please check the ATLAS application sub-forum.
ID: 34287 · Report as offensive     Reply Quote
Sid

Send message
Joined: 26 Jul 12
Posts: 16
Credit: 803,697
RAC: 0
Message 34288 - Posted: 5 Feb 2018, 11:32:14 UTC - in response to Message 34287.  

Well, if I'm slowing down the transitioner by running long term Theory tasks - probably this a time for me to switch to some other project.
My computer's idle time became my computer's waste time.
ID: 34288 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 172
Credit: 2,690,842
RAC: 7,067
Message 34290 - Posted: 5 Feb 2018, 12:43:01 UTC - in response to Message 34288.  

The tasks that have already been sent out that you guys crunch do not slow down our processes. We have some problems after a lost database connection last week and an internal buffer overflow that we are now cleaning up. Hence the freshly generated tasks do not make it to the scheduler yet.

There should be tasks again once the BOINC transitioner has gone through the backlog.
ID: 34290 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 172
Credit: 2,690,842
RAC: 7,067
Message 34299 - Posted: 6 Feb 2018, 10:08:20 UTC

We have been running the BOINC transitioner catchup script during the night, but still suffer from backlog and will have another interruption this afternoon.

We will also try to re-validate the ATLAS tasks that were lost (ref. this ATLAS application thread) Other results should also validate once our servers are back in order.
ID: 34299 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 870
Credit: 6,509,170
RAC: 10,994
Message 34301 - Posted: 7 Feb 2018, 11:04:52 UTC - in response to Message 34299.  
Last modified: 7 Feb 2018, 11:05:03 UTC

We have been running the BOINC transitioner catchup script during the night, but still suffer from backlog and will have another interruption this afternoon.
Nils - any news?
ID: 34301 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 172
Credit: 2,690,842
RAC: 7,067
Message 34302 - Posted: 7 Feb 2018, 14:45:07 UTC - in response to Message 34301.  

Sorry, this drags on. The transitioner backlog is still too large to ship out new work. We hope to be at the end soon, and also to get more memory on our DB servers.
ID: 34302 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 870
Credit: 6,509,170
RAC: 10,994
Message 34303 - Posted: 7 Feb 2018, 14:50:13 UTC - in response to Message 34302.  

The transitioner backlog is still too large to ship out new work.
thanks for the information.
What makes we wonder though is how the transitioner backlog could increase that much within the past 2 days (it's 42.21 hours right now) while now new tasks at all were sent out during this period. How come?
ID: 34303 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 172
Credit: 2,690,842
RAC: 7,067
Message 34304 - Posted: 7 Feb 2018, 15:04:40 UTC - in response to Message 34303.  
Last modified: 7 Feb 2018, 15:55:20 UTC

It still needs to process tasks from the weekend. We are running several instances of it now on other servers (that's why it is not running on our main server).
ID: 34304 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 28 Sep 04
Posts: 300
Credit: 10,773,884
RAC: 15,053
Message 34305 - Posted: 7 Feb 2018, 16:21:47 UTC

A bunch of resend sixtrack tasks just downloaded to my hosts (over 100 tasks which were timed out on other hosts).
ID: 34305 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 172
Credit: 2,690,842
RAC: 7,067
Message 34307 - Posted: 8 Feb 2018, 7:39:31 UTC

Our database server got more memory and we are back in normal operation since yesterday evening. There should be more tasks during the day.
ID: 34307 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 98
Credit: 4,587,541
RAC: 2,118
Message 34308 - Posted: 8 Feb 2018, 9:55:55 UTC - in response to Message 34307.  
Last modified: 8 Feb 2018, 10:12:11 UTC

.... we are back in normal operation since yesterday evening....

Thanks, Nils... but..
I've still got 25 tasks waiting for validation but the database "can't find workunit" Have these been lost?

From a prevoius post "Sorry, if any running tasks have been removed, this is by accident and we apologize for that."
ID: 34308 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 98
Credit: 4,587,541
RAC: 2,118
Message 34329 - Posted: 9 Feb 2018, 11:12:31 UTC - in response to Message 34308.  

.... we are back in normal operation since yesterday evening....

Thanks, Nils... but..
I've still got 25 tasks waiting for validation but the database "can't find workunit" Have these been lost?

From a prevoius post "Sorry, if any running tasks have been removed, this is by accident and we apologize for that."

This represents a bit over 146 hours running time and I can't be the only one. I hope that more care will be taken in future to preserve volunteers' work. Not best pleased.
ID: 34329 · Report as offensive     Reply Quote
Profile Nils Høimyr
Volunteer moderator
Project administrator
Project developer
Project tester

Send message
Joined: 15 Jul 05
Posts: 172
Credit: 2,690,842
RAC: 7,067
Message 34338 - Posted: 9 Feb 2018, 15:33:02 UTC

Indeed sorry again that some workunits that had been sent have been removed by mistake.

While our validators have been able to re-validate a majority of tasks, validation of the results for the lost VM application workunits could not be handled by standard validation process, so we have tried to give them all credit based on CPU time.

Thanks again to you all for your crunching and valuable contributions to LHC@home.
ID: 34338 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 768
Credit: 9,534,733
RAC: 29,955
Message 34339 - Posted: 9 Feb 2018, 16:07:41 UTC - in response to Message 34338.  

ID: 34339 · Report as offensive     Reply Quote
Falkra
Avatar

Send message
Joined: 26 Jan 18
Posts: 2
Credit: 419,733
RAC: 1,854
Message 34342 - Posted: 9 Feb 2018, 17:16:52 UTC

I have the same issue with my task and I can get no new WU since a week approximately.
Event log says no tasks sent (it says it could send ATI/AMD tasks but I have an Intel/Nvidia Setup).

So I'm not downloading any new work units (I also have virtualbox).
ID: 34342 · Report as offensive     Reply Quote
computezrmle

Send message
Joined: 15 Jun 08
Posts: 768
Credit: 9,534,733
RAC: 29,955
Message 34343 - Posted: 9 Feb 2018, 17:46:31 UTC - in response to Message 34342.  
Last modified: 9 Feb 2018, 17:49:11 UTC

I have the same issue with my task and I can get no new WU since a week approximately.
Event log says no tasks sent (it says it could send ATI/AMD tasks but I have an Intel/Nvidia Setup).

So I'm not downloading any new work units (I also have virtualbox).

SixTrack has no task at the moment (and ATLAS only very few).
See: https://lhcathome.cern.ch/lhcathome/server_status.php

Your vbox tasks fail because of "ERR_CPU_VM_EXTENSIONS_DISABLED".
You may work through Yeti's checklist, especially No 4 and No 5.
ID: 34343 · Report as offensive     Reply Quote
m

Send message
Joined: 6 Sep 08
Posts: 98
Credit: 4,587,541
RAC: 2,118
Message 34344 - Posted: 9 Feb 2018, 18:02:12 UTC - in response to Message 34339.  
Last modified: 9 Feb 2018, 18:21:24 UTC

ID: 34344 · Report as offensive     Reply Quote
Falkra
Avatar

Send message
Joined: 26 Jan 18
Posts: 2
Credit: 419,733
RAC: 1,854
Message 34345 - Posted: 9 Feb 2018, 18:40:05 UTC - in response to Message 34343.  

Your vbox tasks fail because of "ERR_CPU_VM_EXTENSIONS_DISABLED".
You may work through Yeti's checklist, especially No 4 and No 5.

Thanks A LOT ! I enabled VT-X in Bios, did the other fixes (in the xml file) and vbox64 is now running again (I updated it and installed as well the extension pack).

Thanks again.
ID: 34345 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : Task creation delayed - database maintenance


©2018 CERN