Message boards : CMS Application : no new WUs available
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 20 · Next

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,224
RAC: 353
Message 45456 - Posted: 18 Oct 2021, 17:50:39 UTC - in response to Message 45445.  

It looks like I was a bit optimistic last night...
However, our WMAgent is now fully operational again, and tasks/jobs are available, Thanks for your patience during the outage.
ID: 45456 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2101
Credit: 159,817,517
RAC: 132,770
Message 45463 - Posted: 18 Oct 2021, 23:04:34 UTC - in response to Message 45441.  

no CMS tasks in the download queue

Is back again.
ID: 45463 · Report as offensive     Reply Quote
Toby Broom
Volunteer moderator

Send message
Joined: 27 Sep 08
Posts: 807
Credit: 652,444,676
RAC: 280,103
Message 45480 - Posted: 20 Oct 2021, 17:20:25 UTC - in response to Message 45463.  

No issue for me, I see 166 and I have ca 800 jobs in my queue.
ID: 45480 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,224
RAC: 353
Message 45484 - Posted: 20 Oct 2021, 19:53:02 UTC - in response to Message 45480.  

No issue for me, I see 166 and I have ca 800 jobs in my queue.

Yes, the post puzzled me too.
BTW, as far as terminology goes, you have tasks in your queue. As I tried to explain elsewhere tonight, a CMS@Home BOINC task is an instantiation of our VM which then attempts to run CMSSW jobs from the condor pool.
ID: 45484 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2101
Credit: 159,817,517
RAC: 132,770
Message 45628 - Posted: 7 Nov 2021, 6:52:36 UTC - in response to Message 45484.  

no new WUs available
ID: 45628 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,224
RAC: 353
Message 45717 - Posted: 18 Nov 2021, 8:33:59 UTC

I've just noticed that the number of running jobs seems to be falling. No obvious reasons why from my preliminary checks. I'll be in my office in an hour or so, where I can do more detailed investigations.
ID: 45717 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2101
Credit: 159,817,517
RAC: 132,770
Message 45718 - Posted: 18 Nov 2021, 9:56:07 UTC - in response to Message 45717.  

atm the flow of new tasks is ok.
There are Volunteers with problems in running a CMS-Task (for example short run less than 10 sec.)
This mean a lot of tasks from CMS have no restart for a normal using.

Ivan, we hoping you have a wingman for your input of new tasks in the near future.
Thank you for your work!
ID: 45718 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,224
RAC: 353
Message 45719 - Posted: 18 Nov 2021, 10:44:33 UTC - in response to Message 45718.  

Yeah, things are looking OK at the moment, but the delay in some monitoring reports means I can't be totally sure that we are still on track. I'll keep an eye on it.
I do have a wingperson, but at the moment she is not actively involved in workflow injection. One good thing that's emerged in the last month is that I misremembered my contract details, and I should still be here until March 2023, health permitting (I'm now booster jabbed, and had the 'flu shot as well, so hopefully I'll get through this winter OK 🤞).
ID: 45719 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2101
Credit: 159,817,517
RAC: 132,770
Message 45720 - Posted: 18 Nov 2021, 11:02:44 UTC - in response to Message 45719.  

All the best for you, Ivan :-)).
ID: 45720 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,224
RAC: 353
Message 45722 - Posted: 18 Nov 2021, 18:52:55 UTC - in response to Message 45717.  

OK, the glitch seems to have been temporary, but it may have affected post-production jobs that run on the T3_CH_CMSAtHome cluster at CERN. I'll check with Laurence tomorrow if the situation there still seems iffy.
ID: 45722 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2413
Credit: 226,454,837
RAC: 132,412
Message 45759 - Posted: 28 Nov 2021, 12:46:40 UTC

Computers are getting tasks but the subtask queue seems to be empty since 11:00 UTC this morning.
ID: 45759 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1280
Credit: 8,493,530
RAC: 2,126
Message 45760 - Posted: 28 Nov 2021, 13:23:36 UTC - in response to Message 45759.  

Computers are getting tasks but the subtask queue seems to be empty since 11:00 UTC this morning.

The mechanism to stop creating BOINC workunits, when no sub-tasks are avaiable for the VMs, seems to work: Unsent 0
ID: 45760 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,224
RAC: 353
Message 45761 - Posted: 28 Nov 2021, 19:40:00 UTC - in response to Message 45760.  
Last modified: 28 Nov 2021, 19:53:26 UTC

Computers are getting tasks but the subtask queue seems to be empty since 11:00 UTC this morning.

The mechanism to stop creating BOINC workunits, when no sub-tasks are avaiable for the VMs, seems to work: Unsent 0

Oops, sorry 'bout that! I must have forgotten to check the status of the queues this morning... I had some connectivity problems today, had to reset my WiFi modem a few times; guess I forgot which sites I hadn't been able to visit. New workflow injected, jobs should become available again "Real Soon Now" (© Jerry Pournelle).
ID: 45761 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2413
Credit: 226,454,837
RAC: 132,412
Message 45778 - Posted: 3 Dec 2021, 9:25:50 UTC

The CMS subtask queue seems to be empty since 8:24 UTC.
May be a result of another major outage last night.
ID: 45778 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,224
RAC: 353
Message 45779 - Posted: 3 Dec 2021, 10:18:52 UTC - in response to Message 45778.  

The CMS subtask queue seems to be empty since 8:24 UTC.
May be a result of another major outage last night.

Yes, there is a problem with the WMAgent and no new jobs are being created. I've e-mailed the people who are able to restart it.
There does seem to have been a network problem last night, similar to what we had last Friday night. There was a large spike in the number of jobs failing with error 8002, which usually means they haven't been able to contact any of the frontier servers (i.e. conditions database).
ID: 45779 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,224
RAC: 353
Message 45780 - Posted: 3 Dec 2021, 10:41:52 UTC - in response to Message 45779.  

The agent has been restarted and jobs are available again.
I'm told there was a network problem at CERN last night but, correlating different sources, I think our agent problem actually started several hours before that.
ID: 45780 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,224
RAC: 353
Message 45831 - Posted: 8 Dec 2021, 22:19:11 UTC

I'm having problems submitting a new batch of jobs tonight, and I can't make much sense of the error message.
I estimate that we have about 12 hours before we start to run out of jobs, so I'll try again in the morning.
ID: 45831 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,224
RAC: 353
Message 45906 - Posted: 20 Dec 2021, 21:50:03 UTC

OK, THAT problem was eventually solved. Now we are having problems going forward. A change has unexpectedly meant that volunteer machines are not getting production jobs, but instead post-production jobs that are meant to go to a dedicated VM cluster at CERN. We suspect some sort of typo/misconfiguration, but it's hard to track down the right people at this time of year. Setting No New Tasks or switching to other projects is, as always, recommended in these cases.
ID: 45906 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,224
RAC: 353
Message 45913 - Posted: 21 Dec 2021, 11:19:43 UTC - in response to Message 45906.  

We have found the source of the problem, and jobs are available again.
ID: 45913 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1688
Credit: 103,858,099
RAC: 121,542
Message 45973 - Posted: 2 Jan 2022, 6:57:38 UTC

Happy New Year to everybody :-)

Unfortunately, the new year does not begin very well for CMS: NO NEW TASKS :-(
ID: 45973 · Report as offensive     Reply Quote
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 20 · Next

Message boards : CMS Application : no new WUs available


©2024 CERN