Message boards : News : CMS@Home accidentally shut down -- Please set No New Tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 648
Credit: 4,921,863
RAC: 1,445
Message 41511 - Posted: 10 Feb 2020, 15:33:39 UTC

We need to upgrade the CMS@Home WMAgent before Thursday, so I tried to set the workflows to drain down. Unfortunately, I misunderstood the batch states and killed off most of them instead. :-(. There's one still left with about 200 jobs, so that won't last long.
Please set your CMS projects to No New Tasks to avoid getting lots of computation errors. I'll let you know when the upgrade is done and jobs are flowing again.
ID: 41511 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1345
Credit: 67,206,591
RAC: 95,803
Message 41512 - Posted: 10 Feb 2020, 15:39:10 UTC - in response to Message 41511.  

Ah, this explains why most of my running CMS tasks crashed within a few minutes short ago.
Thought it was caused by the nasty storm.

Thanks for the post.
ID: 41512 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 648
Credit: 4,921,863
RAC: 1,445
Message 41513 - Posted: 10 Feb 2020, 15:42:35 UTC - in response to Message 41512.  

Ah, this explains why most of my running CMS tasks crashed within a few minutes short ago.
Thought it was caused by the nasty storm.

Thanks for the post.

Yes, sorry 'bout that. I didn't expect it to affect running jobs.
ID: 41513 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 80
Credit: 294,674
RAC: 882
Message 41522 - Posted: 11 Feb 2020, 10:04:02 UTC - in response to Message 41511.  

We need to upgrade the CMS@Home WMAgent before Thursday, so I tried to set the workflows to drain down. Unfortunately, I misunderstood the batch states and killed off most of them instead. :-(. There's one still left with about 200 jobs, so that won't last long.
Please set your CMS projects to No New Tasks to avoid getting lots of computation errors. I'll let you know when the upgrade is done and jobs are flowing again.


I'm not sure I understand this. If you're shutting the server down, surely my client just won't be able to get any tasks anyway? Why do we have to tell it not to fetch? Shouldn't I just get "no work available" back from the server, or perhaps no response?
ID: 41522 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 893
Credit: 6,239,710
RAC: 2,231
Message 41524 - Posted: 11 Feb 2020, 10:48:09 UTC - in response to Message 41522.  

I'm not sure I understand this. If you're shutting the server down, surely my client just won't be able to get any tasks anyway? Why do we have to tell it not to fetch? Shouldn't I just get "no work available" back from the server, or perhaps no response?
Your client requests a task from the BOINC server. That task creates a CMS-VM on your machine and that VM requests CMS-jobs from another server and that's out of jobs or unreachable.
ID: 41524 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 80
Credit: 294,674
RAC: 882
Message 41529 - Posted: 11 Feb 2020, 13:09:34 UTC - in response to Message 41524.  

Your client requests a task from the BOINC server. That task creates a CMS-VM on your machine and that VM requests CMS-jobs from another server and that's out of jobs or unreachable.


Ah that makes sense. Although couldn't the Boinc server have been told not to hand out any CMS?
ID: 41529 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1345
Credit: 67,206,591
RAC: 95,803
Message 41530 - Posted: 11 Feb 2020, 13:25:46 UTC - in response to Message 41529.  

Although couldn't the Boinc server have been told not to hand out any CMS?

It sometimes happens that the BOINC server doesn't immediately stop sending task envelopes, especially when the subtask shortage is not very long.
In this case Ivan sends out an NNT info.
This ensures a faster CMS restart when fresh work is available.
ID: 41530 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 80
Credit: 294,674
RAC: 882
Message 41535 - Posted: 11 Feb 2020, 19:11:41 UTC - in response to Message 41530.  

Although couldn't the Boinc server have been told not to hand out any CMS?

It sometimes happens that the BOINC server doesn't immediately stop sending task envelopes, especially when the subtask shortage is not very long.
In this case Ivan sends out an NNT info.
This ensures a faster CMS restart when fresh work is available.


Ok, although chances are most people won't see that message for a day or so. I usually don't even notice them at all.
ID: 41535 · Report as offensive     Reply Quote
gouflo

Send message
Joined: 21 Mar 11
Posts: 1
Credit: 12,138
RAC: 29
Message 41538 - Posted: 12 Feb 2020, 7:15:14 UTC

My LHC@home has stopped working since the server failed. So it doesn't make a difference. :-/
ID: 41538 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 648
Credit: 4,921,863
RAC: 1,445
Message 41543 - Posted: 12 Feb 2020, 11:09:46 UTC - in response to Message 41535.  

Although couldn't the Boinc server have been told not to hand out any CMS?

It sometimes happens that the BOINC server doesn't immediately stop sending task envelopes, especially when the subtask shortage is not very long.
In this case Ivan sends out an NNT info.
This ensures a faster CMS restart when fresh work is available.


Ok, although chances are most people won't see that message for a day or so. I usually don't even notice them at all.

Yeah, sure, but I have to do my best. The problem is that when no jobs are available, then the BOINC task will return an error. Your BOINC manager will flag that as an error, and reduce the quota of tasks you can request per day, until this gets down to unity. The quota will only increase when you run a successful task. So if you don't set NNT then your machine gets reduced to one task request per day, until new jobs are available, allowing a task to complete, thus increasing the per-day quota. So, it takes several days to ramp up to full production again if you allow your machine to deplete its quota.
ID: 41543 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 648
Credit: 4,921,863
RAC: 1,445
Message 41544 - Posted: 12 Feb 2020, 11:10:04 UTC - in response to Message 41511.  

We're up again.
ID: 41544 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 394
Credit: 11,773,203
RAC: 6,517
Message 41546 - Posted: 12 Feb 2020, 11:18:34 UTC - in response to Message 41543.  

So, it takes several days to ramp up to full production again if you allow your machine to deplete its quota.

Thanks. It makes sense now.
ID: 41546 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 80
Credit: 294,674
RAC: 882
Message 41548 - Posted: 12 Feb 2020, 17:54:56 UTC - in response to Message 41543.  
Last modified: 12 Feb 2020, 17:56:31 UTC

Although couldn't the Boinc server have been told not to hand out any CMS?

It sometimes happens that the BOINC server doesn't immediately stop sending task envelopes, especially when the subtask shortage is not very long.
In this case Ivan sends out an NNT info.
This ensures a faster CMS restart when fresh work is available.


Ok, although chances are most people won't see that message for a day or so. I usually don't even notice them at all.

Yeah, sure, but I have to do my best. The problem is that when no jobs are available, then the BOINC task will return an error. Your BOINC manager will flag that as an error, and reduce the quota of tasks you can request per day, until this gets down to unity. The quota will only increase when you run a successful task. So if you don't set NNT then your machine gets reduced to one task request per day, until new jobs are available, allowing a task to complete, thus increasing the per-day quota. So, it takes several days to ramp up to full production again if you allow your machine to deplete its quota.


Ok, thanks for explaining in detail. I take it some people run only CMS on their clients? I have Androids that only do Sixtrack (that's all that gets given to them, I assume Androids can't do the others), but my main desktop does any LHC task, and rarely gets CMS, presumably since Theory and Atlas have a lot more tasks available. My smaller three desktops produce a lot of errors when running virtual machine LHC (they're only 8GB RAM with old processors), so I don't use LHC on those.

You could email everyone if there's something urgent - we all have the option to tick (or not) the thing that says it's ok for admins to email us. Then we'd see it quicker.
ID: 41548 · Report as offensive     Reply Quote
MaylinhBoettcherAt10YearsOld

Send message
Joined: 16 Jan 20
Posts: 3
Credit: 7,502
RAC: 4
Message 41551 - Posted: 12 Feb 2020, 20:32:17 UTC - in response to Message 41548.  

"My smaller three desktops produce a lot of errors when running virtual machine LHC (they're only 8GB RAM with old processors)" 4gb ram on 9 year old cpu still working fine for me with LHC.
ID: 41551 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 80
Credit: 294,674
RAC: 882
Message 41552 - Posted: 12 Feb 2020, 20:40:46 UTC - in response to Message 41551.  

"My smaller three desktops produce a lot of errors when running virtual machine LHC (they're only 8GB RAM with old processors)" 4gb ram on 9 year old cpu still working fine for me with LHC.


Which subprojects does it run? I didn't bother selecting, I let them do anything. I got computation errors on about 75% of them, so thought it a waste of time.

And what CPUs do you have? One of mine for example is a Q8400 with 8GB DDR2.

And what OS? Mine all run Windows 10.
ID: 41552 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1345
Credit: 67,206,591
RAC: 95,803
Message 41566 - Posted: 14 Feb 2020, 14:32:18 UTC - in response to Message 41548.  

You could email everyone if there's something urgent - we all have the option to tick (or not) the thing that says it's ok for admins to email us. Then we'd see it quicker.

Are you aware that you can subscribe to MB threads or get them via RSS feed?
The way this works can be configured on the MB pages and on this page:
https://lhcathome.cern.ch/lhcathome/edit_forum_preferences_form.php
It's far more than simply allow an admin to send you a mail.
ID: 41566 · Report as offensive     Reply Quote
Peter Hucker

Send message
Joined: 12 Aug 06
Posts: 80
Credit: 294,674
RAC: 882
Message 41567 - Posted: 14 Feb 2020, 15:40:28 UTC - in response to Message 41566.  
Last modified: 14 Feb 2020, 15:41:29 UTC

You could email everyone if there's something urgent - we all have the option to tick (or not) the thing that says it's ok for admins to email us. Then we'd see it quicker.

Are you aware that you can subscribe to MB threads or get them via RSS feed?
The way this works can be configured on the MB pages and on this page:
https://lhcathome.cern.ch/lhcathome/edit_forum_preferences_form.php
It's far more than simply allow an admin to send you a mail.


Sorry, I have no idea what the point of RSS is and have never felt the need to use it. Everything I use emails me when something needs my attention (eg when you replied in this conversation). I really don't want yet another source to check and another program to run. In my life I have four inputs - phonecalls, texts, emails, and physical post (mail). I don't want five.
ID: 41567 · Report as offensive     Reply Quote

Message boards : News : CMS@Home accidentally shut down -- Please set No New Tasks


©2020 CERN