Message boards : CMS Application : "No jobs were available to run" since this morning.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 874
Credit: 5,805,106
RAC: 264
Message 31039 - Posted: 25 Jun 2017, 11:34:21 UTC - in response to Message 31037.  
Last modified: 25 Jun 2017, 11:42:59 UTC

The Counter for CMS-Tasks went also to ZERO. (500 at the moment)

Sorry, where are you seeing that? Everything looks pretty normal to me. In fact we have an up-tick at the moment, possibly picking up overspill from another app that's not sending tasks.

[Edit] Strange, we seem to be picking up tasks at the expense of LHCb, although they have a lot in the queue. ATLAS was down to zero tasks available, but some are coming through now. [/Edit]
ID: 31039 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1511
Credit: 42,332,999
RAC: 41,168
Message 31042 - Posted: 25 Jun 2017, 12:49:50 UTC - in response to Message 31039.  

... ATLAS was down to zero tasks available, but some are coming through now.

Ivan, where do you see ATLAS Tasks coming through?
ID: 31042 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1508
Credit: 48,162,987
RAC: 117,292
Message 31043 - Posted: 25 Jun 2017, 13:27:56 UTC - in response to Message 31039.  
Last modified: 25 Jun 2017, 13:33:00 UTC

Sorry, where are you seeing that?

Server-Status page at LHCatHome.
Atlas give unresolved back which are running. Since Friday there is no ATLAS in the pipe.
Hopeful, that the Server Status page is not the real one.

Edit: CGI testing, the whole day-Friday, see -dev forum under News.
ID: 31043 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1511
Credit: 42,332,999
RAC: 41,168
Message 31044 - Posted: 25 Jun 2017, 13:41:22 UTC - in response to Message 31043.  

Sorry, where are you seeing that?

Server-Status page at LHCatHome.
Atlas give unresolved back which are running. Since Friday there is no ATLAS in the pipe.

That's exactly where I am looking at all the time. And Since Friday, under "unsent" I see only zero.
Plus, whenever the BOINC Manager tries to download ATLAS tasks, it says "no tasks available".
ID: 31044 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 874
Credit: 5,805,106
RAC: 264
Message 31045 - Posted: 25 Jun 2017, 15:12:31 UTC - in response to Message 31042.  
Last modified: 25 Jun 2017, 15:14:57 UTC

... ATLAS was down to zero tasks available, but some are coming through now.

Ivan, where do you see ATLAS Tasks coming through?

When I looked, there were 24 tasks queued. After I posted, there were none. :-(
[Edit] It's showing one just now! [/Edit]
ID: 31045 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 1508
Credit: 48,162,987
RAC: 117,292
Message 31046 - Posted: 25 Jun 2017, 15:19:42 UTC
Last modified: 25 Jun 2017, 15:20:18 UTC

CMS is dropping down to 281 Tasks at the moment.
This may be not enough up to monday morning.
ID: 31046 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 874
Credit: 5,805,106
RAC: 264
Message 31077 - Posted: 26 Jun 2017, 11:18:17 UTC - in response to Message 31046.  

CMS is dropping down to 281 Tasks at the moment. This may be not enough up to monday morning.

We're surviving. The queue of jobs on the Condor server is down from its normal 700 but that's not totally unusual. We are running more jobs than normal at present, which may be putting some pressure on the pipeline. I think we are a fair way from being critical yet.
ID: 31077 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 874
Credit: 5,805,106
RAC: 264
Message 31093 - Posted: 26 Jun 2017, 15:56:33 UTC - in response to Message 30950.  

I'll be sure to give you as much warning as I can when the WMAgent update is scheduled.

Thanks a lot, Ivan

We're running down the queue overnight to update WMAgent tomorrow. Best set
No New Tasks as soon as practicable.
ID: 31093 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jun 08
Posts: 1965
Credit: 139,395,670
RAC: 86,771
Message 31096 - Posted: 26 Jun 2017, 16:16:27 UTC - in response to Message 31093.  

Best set No New Tasks as soon as practicable.

Done.
Thank you Ivan.
ID: 31096 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 874
Credit: 5,805,106
RAC: 264
Message 31100 - Posted: 26 Jun 2017, 18:01:01 UTC - in response to Message 31096.  

Alan tells me he has an appointment tomorrow morning, so the intervention will take place after lunch (CERN time). Hopefully we'll have jobs again later in the afternoon.
ID: 31100 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1511
Credit: 42,332,999
RAC: 41,168
Message 31135 - Posted: 27 Jun 2017, 16:26:38 UTC - in response to Message 31100.  

Alan tells me he has an appointment tomorrow morning, so the intervention will take place after lunch (CERN time). Hopefully we'll have jobs again later in the afternoon.

How does it look, Ivan?
New software implemented successfully?
So far, no new tasks ready for download yet.
ID: 31135 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 874
Credit: 5,805,106
RAC: 264
Message 31137 - Posted: 27 Jun 2017, 16:53:01 UTC - in response to Message 31135.  

Alan tells me he has an appointment tomorrow morning, so the intervention will take place after lunch (CERN time). Hopefully we'll have jobs again later in the afternoon.

How does it look, Ivan?
New software implemented successfully?
So far, no new tasks ready for download yet.

Yes, Alan finished the update about an hour ago. I submitted a new batch 45 minutes ago and the queue on the condor server is back up to its usual 700. You can turn on the taps again.
ID: 31137 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1511
Credit: 42,332,999
RAC: 41,168
Message 31140 - Posted: 27 Jun 2017, 20:04:32 UTC - in response to Message 31137.  

all my 3 hosts downloaded new tasks - so it seems to work fine :-)
ID: 31140 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 874
Credit: 5,805,106
RAC: 264
Message 31142 - Posted: 27 Jun 2017, 21:06:19 UTC

Yes, although we're ramping up a bit more slowly than I expected. This is possibly partly because Laurence has cut his cluster down to just 10 cores while he tries some experiments -- he had considerably more than that before. Also, some people may not have seen the news announcements and their machines have entered quota back-off. And, as there was some evidence a week or so back, some hosts may have started running other apps as backfill and we need to wait for those tasks to finish before they come back to CMS.
ID: 31142 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1511
Credit: 42,332,999
RAC: 41,168
Message 31147 - Posted: 28 Jun 2017, 4:56:38 UTC - in response to Message 31142.  

Yes, although we're ramping up a bit more slowly than I expected.

which can be seen clearly here:

https://lhcathomedev.cern.ch/lhcathome-dev/cms_job.php

But I guess it will be back to normal soon.
ID: 31147 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1511
Credit: 42,332,999
RAC: 41,168
Message 31159 - Posted: 28 Jun 2017, 16:04:04 UTC - in response to Message 31147.  

hm, for some reason the number of running jobs stagnates at a level of sligthly above 400 - as seen here:

https://lhcathomedev.cern.ch/lhcathome-dev/cms_job.php

what might be the reason?
ID: 31159 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 874
Credit: 5,805,106
RAC: 264
Message 31161 - Posted: 28 Jun 2017, 17:38:36 UTC - in response to Message 31159.  

hm, for some reason the number of running jobs stagnates at a level of sligthly above 400 - as seen here:

https://lhcathomedev.cern.ch/lhcathome-dev/cms_job.php

what might be the reason?

For a start Laurence has been doing tests with his cluster, so that's a few hundred cores lost. Also, we don't have as many active users as normal. This is possibly due to people heeding my warning that a drought was coming, and not yet re-starting; people who didn't see my warning and whose machines are still in a quota back-off; and people whose hosts were set to switch to other apps when CMS had no jobs, and have yet to fully switch back. I'm not panicking yet.
ID: 31161 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1511
Credit: 42,332,999
RAC: 41,168
Message 31173 - Posted: 29 Jun 2017, 8:40:13 UTC

for a few hours, I have not been able to download CMS tasks ("no tasks available") - BTW, no ATLAS tasks either.

Anything wrong with the newly installed WMAgent?
ID: 31173 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1093
Credit: 6,827,780
RAC: 791
Message 31177 - Posted: 29 Jun 2017, 9:19:27 UTC - in response to Message 31173.  

for a few hours, I have not been able to download CMS tasks ("no tasks available") - BTW, no ATLAS tasks either.

Anything wrong with the newly installed WMAgent?

See my post in the ATLAS application thread - https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4331&postid=31176
ID: 31177 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : CMS Application : "No jobs were available to run" since this morning.


©2022 CERN