Message boards : CMS Application : EXIT_NO_SUB_TASKS
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 16 · Next

AuthorMessage
Dark Angel
Avatar

Send message
Joined: 7 Aug 11
Posts: 62
Credit: 21,010,648
RAC: 8,927
Message 41797 - Posted: 2 Mar 2020, 3:14:11 UTC

Sorry to be the bearer of bad news, but I'm getting these as of today as well.
Unit after unit failing with "207 (0x000000CF) EXIT_NO_SUB_TASKS"
ID: 41797 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,890,650
RAC: 138,245
Message 41798 - Posted: 2 Mar 2020, 6:53:42 UTC - in response to Message 41797.  

ID: 41798 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,337,324
RAC: 102,062
Message 41799 - Posted: 2 Mar 2020, 7:55:07 UTC - in response to Message 41798.  

The case is still open.
:-( :-( :-(
ID: 41799 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 41804 - Posted: 2 Mar 2020, 12:19:43 UTC - in response to Message 41799.  

Actually, that is quite encouraging. It means they are doing a ground-up fix.
No more empty work units.
Even a native version.
ID: 41804 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 41819 - Posted: 4 Mar 2020, 18:53:01 UTC - in response to Message 41804.  
Last modified: 4 Mar 2020, 19:02:36 UTC

CMS is back and working, for the moment.
ID: 41819 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Nov 14
Posts: 602
Credit: 24,371,321
RAC: 0
Message 41822 - Posted: 5 Mar 2020, 15:29:33 UTC - in response to Message 41819.  

for the moment.

The moment has passed.
ID: 41822 · Report as offensive     Reply Quote
the Kris

Send message
Joined: 29 Mar 10
Posts: 2
Credit: 1,183,960
RAC: 0
Message 42157 - Posted: 13 Apr 2020, 8:37:37 UTC - in response to Message 41822.  

Again for 2 days already: "Exit status 207 (0x000000CF) EXIT_NO_SUB_TASKS"
ID: 42157 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,337,324
RAC: 102,062
Message 42161 - Posted: 13 Apr 2020, 14:43:52 UTC - in response to Message 42157.  

I thinks as long as Ivan does not give his "go ahead", crunching CMS tasks does not make a whole lot of sense.
There still seems to be a problem (or more than one) that has not been resolved :-(
ID: 42161 · Report as offensive     Reply Quote
NOGOOD

Send message
Joined: 18 Nov 17
Posts: 119
Credit: 51,284,804
RAC: 20,875
Message 42895 - Posted: 20 Jun 2020, 13:21:51 UTC

Hello.

Is CMS running fine now?
ID: 42895 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,337,324
RAC: 102,062
Message 42898 - Posted: 21 Jun 2020, 4:50:52 UTC - in response to Message 42895.  

Hello.
Is CMS running fine now?
yes, here it's been running fine for several days now. So let's keep our fingers crossed :-)
ID: 42898 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2386
Credit: 222,890,650
RAC: 138,245
Message 43231 - Posted: 19 Aug 2020, 10:51:22 UTC

Just gut a couple of errors that point out the CMS subtask queue might be empty.
ID: 43231 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,337,324
RAC: 102,062
Message 43232 - Posted: 19 Aug 2020, 14:46:25 UTC - in response to Message 43231.  

Meanwhile, the tasks queue is empty; which indicates that the automated stopping of tasks download in case of empty jobs queue works :-)

Nevertheless, it would be nice to receive more CMS work again :-)
ID: 43232 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2071
Credit: 156,084,038
RAC: 105,553
Message 43233 - Posted: 19 Aug 2020, 15:14:31 UTC - in response to Message 43232.  

ID: 43233 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,337,324
RAC: 102,062
Message 43234 - Posted: 19 Aug 2020, 18:44:39 UTC - in response to Message 43233.  

https://lhcathome.cern.ch/lhcathome/server_status.php
this is exactly what I was referring to. Too bad. So we'll see when new tasks will be available again. Hopefully soon.
ID: 43234 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 997
Credit: 6,264,307
RAC: 71
Message 43235 - Posted: 20 Aug 2020, 9:14:09 UTC - in response to Message 43234.  
Last modified: 20 Aug 2020, 9:17:31 UTC

Yes, sorry, we seem to be having a WMAgent problem again. It might be related to an "intervention" (i.e. code update) that occurred yesterday -- unfortunately I only ever get notified of these post facto. Our agent needs some manual tweaks when it's restarted and these may need to be re-applied. I mailed several people who can do a kickstart on the agent last night, but no response yet. It's August; I know at least one of them is on holiday.

At least the automatic stoppage of the queue worked. The WMStats agent monitor reckoned that jobs were still running but with this and other indications I now realise that this is false, it just took me a while to look at other indicators. Sorry for the delay, I await developments.
ID: 43235 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 997
Credit: 6,264,307
RAC: 71
Message 43236 - Posted: 20 Aug 2020, 9:54:31 UTC - in response to Message 43235.  

Actually, on looking more closely, I may have been a bit too harsh on the WMCore developers. This suggests the problem pre-dates the intervention:
agent last updated: 2020/8/18 (Tue) 16:01:14 UTC : 41 h 51 m
There was a general failure of monitoring software across the CERN network on Tuesday, it may have played some part in the problem.
ID: 43236 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 997
Credit: 6,264,307
RAC: 71
Message 43237 - Posted: 20 Aug 2020, 11:51:00 UTC - in response to Message 43236.  

OK, looks like an upgrade incompatability. If you can follow it:
https://github.com/dmwm/WMCore/issues/9876
ID: 43237 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 997
Credit: 6,264,307
RAC: 71
Message 43238 - Posted: 20 Aug 2020, 15:20:23 UTC - in response to Message 43237.  

One of the capables restarted the agent. The monitor still insists there is a problem, but other indications are that jobs are available again. I'm keeping an eye on it as much I can.
ID: 43238 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 997
Credit: 6,264,307
RAC: 71
Message 43239 - Posted: 20 Aug 2020, 15:53:33 UTC - in response to Message 43238.  

Monitors are making sense now. I have a bad network connection so it's hard to keep right up-to-date.
ID: 43239 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1686
Credit: 100,337,324
RAC: 102,062
Message 43242 - Posted: 21 Aug 2020, 7:26:55 UTC

all seems to work well now :-)
ID: 43242 · Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 16 · Next

Message boards : CMS Application : EXIT_NO_SUB_TASKS


©2024 CERN