Message boards :
CMS Application :
EXIT_NO_SUB_TASKS
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 16 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1052 Credit: 48,840,376 RAC: 1,618 ![]() ![]() |
So far I only see one Valid Run time 7 hours 32 min 42 sec I am starting up a new one just to see if it even gets to HTCondor Ping in <13mins But if you still have those running that pretty much means we may be back to running Valids again (I Hope) |
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 7 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 7 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 7 ![]() |
|
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 7 ![]() |
Ah, I had an over-ride switch in my submission script that still pointed to the testbed server. Changed that and the submission went through. Now I'm waiting to see if the workflow actually shows up on the production server. OK, I see the workflow on the production server now but I may have to go home before it starts farming out any jobs. Digits cruciate... ![]() |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1052 Credit: 48,840,376 RAC: 1,618 ![]() ![]() |
Suspend Time Again Failing here and over at -dev |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1052 Credit: 48,840,376 RAC: 1,618 ![]() ![]() |
No guarantee yet but I started up a new batch and so far so good so I will check back in a few hours and see if they stay running this time (3:33am right now) |
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 7 ![]() |
The new batch on the production server doesn't seem to have created any jobs, so obviously it's not sending any out. We are running down the existing queues on the testbed server at reduced efficiency (i.e. not every job request is met within ten minutes). I'll send out more messages, but it is Sunday... ![]() |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,068,188 RAC: 72,595 ![]() ![]() ![]() |
... I'll send out more messages, but it is Sunday...Ivan, any news? |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,068,188 RAC: 72,595 ![]() ![]() ![]() |
A few hours ago I took the risk and downloaded several CMS tasks, and then I left for a while. After coming back about 3 hours later, I noticed that 3 of them had failed with 207 (0x000000CF) EXIT_NO_SUB_TASKS, and 2 of them had failed with 1 (0x00000001) Unknown error code after almost 3 hours: https://lhcathome.cern.ch/lhcathome/result.php?resultid=252857843 Unfortunately, everything was a waste of CPU time :-( What's going wrong with CMS? |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,068,188 RAC: 72,595 ![]() ![]() ![]() |
here the next ones which fail almost exactly 2:40 hrs after start, with: 1 (0x00000001) Unknown error code: https://lhcathome.cern.ch/lhcathome/result.php?resultid=252855754 https://lhcathome.cern.ch/lhcathome/result.php?resultid=252858062 why has CMS become that unstable lately? |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1052 Credit: 48,840,376 RAC: 1,618 ![]() ![]() |
Yeah don't waste time running these Erich until we get the server end of this taken care of. I tested one and it ran 5 hours and then crashed. These CMS VB tasks are famous for tricking us into thinking they will run Valids and then this happens. Just wait for the *Ivan Report* and I see the CERN Service Portal is not updated either. (I did expect them to be up and running here today) |
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 7 ![]() |
Apparently a database copy went wrong, complicated by the network problems Thursday night, people working in different time-zones, and the weekend -- plus some lack of communication which led to effort being wasted over the weekend. I'm still waiting on further information from the service ticket (unfortunately this one is CERN internal so most of you won't be able to read it). ![]() |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,068,188 RAC: 72,595 ![]() ![]() ![]() |
Thanks, Ivan, for the information. So I/we will wait until further word from you. |
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 7 ![]() |
|
Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0 ![]() ![]() |
...the North American contingent may sort it out overnight. Our Thanksgiving is coming up Thursday. Everyone will be off Friday. Good luck. |
Send message Joined: 18 Dec 15 Posts: 1599 Credit: 78,068,188 RAC: 72,595 ![]() ![]() ![]() |
the North American contingent may sort it out overnight.so let's keep our fingers crossed :-) |
![]() Send message Joined: 29 Aug 05 Posts: 956 Credit: 6,231,115 RAC: 7 ![]() |
Big problem, I'm afraid -- the database tables appear to be empty! The problem cannot be fixed quickly. If the tables are empty now, the only option is to repeat the import. And this will take few days as the amount of data to be copied is huge and the tables do not have partitions so it is impossible to parallelise the work. Also the IO subsystem for the integration databases is not as fast as the production databases..... An alternative to all this import/export would be needed in the near future.... ...and Thursday and Friday are US holidays... ![]() |
![]() Send message Joined: 15 Jun 08 Posts: 2244 Credit: 199,043,137 RAC: 125,669 ![]() ![]() ![]() |
Oops! Calm down and don't forget to breath. ... and Happy Thanksgiving ... <edit> Forgot to ask: Can the CMS tasks be stopped at the BOINC server? </edit> |
![]() ![]() Send message Joined: 24 Oct 04 Posts: 1052 Credit: 48,840,376 RAC: 1,618 ![]() ![]() |
Thanks Ivan, now I can unplug my satellite modem for the night so I don't lose my only high-speed I have left for the month, And yes Happy Thanksgiving (eating turkey and watching 9+ hours of NFL football) |
©2023 CERN