Thread 'EXIT_NO_SUB

Author	Message
Erich56 Send message Joined: 18 Dec 15 Posts: 1962 Credit: 159,289,840 RAC: 46,178	Message 40706 - Posted: 27 Nov 2019, 11:23:48 UTC Thanks, Ivan, as always, for passing the (not too good) information on to us. So we will wait and see what happens next week. What should be done, though, I guess, is to stop tasks from being downloaded. ID: 40706 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 801	Message 40723 - Posted: 29 Nov 2019, 11:57:46 UTC OK, thanks to great efforts by the CMS & CERN IT teams, a workaround is in place and we are able to run jobs again! I've submitted a small batch and have jobs running on my boxen. I'll submit a larger batch later, and take the opportunity to increase the job size as the average run-time is less than I would prefer. This should increase our efficiency. ID: 40723 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 300,040,167 RAC: 48,984	Message 40725 - Posted: 29 Nov 2019, 14:06:12 UTC - in response to Message 40723. Thanks. Got 1 task that started fine. What factor do you expect regarding the runtime increase per job? ID: 40725 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 801	Message 40728 - Posted: 29 Nov 2019, 15:21:35 UTC - in response to Message 40725. Thanks. Got 1 task that started fine. What factor do you expect regarding the runtime increase per job? I've gone from 5,000 to 10,000 events per job. Given startup overhead, it should be less than a factor of two (the result file should be approx twice as big, too). Let me know if it causes any problems. It'll take a while for them to show up, there are 1,000 of the previous size to get through first. ID: 40728 · Reply Quote

NOGOOD Send message Joined: 18 Nov 17 Posts: 134 Credit: 59,088,615 RAC: 4,792	Message 40743 - Posted: 1 Dec 2019, 19:40:27 UTC Hello. Now all my CMS tasks ends with error -203 (0xFFFFFF35) ERR_NO_NETWORK_CONNECTION. Of course, internet connection is fine. ID: 40743 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 300,040,167 RAC: 48,984	Message 40744 - Posted: 1 Dec 2019, 20:58:46 UTC - in response to Message 40743. d a couple of your logfiles. All of them show the same error: [pre]2019-12-01 22:38:45 (16792): Guest Log: [DEBUG] Testing network connection to cern.ch on port 80 2019-12-01 22:39:05 (16792): Guest Log: [DEBUG] nc: getaddrinfo: Temporary failure in name resolution 2019-12-01 22:39:05 (16792): Guest Log: [DEBUG] 1 2019-12-01 22:39:05 (16792): Guest Log: [ERROR] Could not connect to cern.ch on port 80[/pre] That's why the VMs shut down. Since the DNS name resolution works for my internet connection you may check your nameservers or change to public ones like 1.1.1.1 (Cloudflare) or 8.8.8.8 (Google). ID: 40744 · Reply Quote

NOGOOD Send message Joined: 18 Nov 17 Posts: 134 Credit: 59,088,615 RAC: 4,792	Message 40745 - Posted: 1 Dec 2019, 21:23:30 UTC - in response to Message 40744. Unfortunately, I do not know, how to do it. And there was no such problem before... ID: 40745 · Reply Quote

NOGOOD Send message Joined: 18 Nov 17 Posts: 134 Credit: 59,088,615 RAC: 4,792	Message 40746 - Posted: 1 Dec 2019, 22:05:02 UTC - in response to Message 40745. And I have stop receiving ATLAS tasks at all several days ago... May be the reason is the same... ID: 40746 · Reply Quote

NOGOOD Send message Joined: 18 Nov 17 Posts: 134 Credit: 59,088,615 RAC: 4,792	Message 40747 - Posted: 1 Dec 2019, 22:19:55 UTC - in response to Message 40746. It looks like only SixTrack is available for me now. But I did not change my preferences. No ATLAS tasks, no Theory tasks, CMS tasks crash. ID: 40747 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 300,040,167 RAC: 48,984	Message 40858 - Posted: 8 Dec 2019, 22:08:58 UTC @Ivan Just noticed at the Grafana pages that the number of running CMS jobs has doubled since Sunday afternoon. Might be that we need a new batch earlier than expected. ID: 40858 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 801	Message 40863 - Posted: 9 Dec 2019, 8:46:39 UTC - in response to Message 40858. @Ivan Just noticed at the Grafana pages that the number of running CMS jobs has doubled since Sunday afternoon. Might be that we need a new batch earlier than expected. Yeah, I've seen that too. I have a batch in the pipeline that's not showing up in WMStats yet. Federica submitted two small tasks last week that appear to have run according to WMStats but I can't find any output in store -- ah, the unmerged result files are on DataBridge, I must be looking in the wrong place on EOS. I've just put in another batch that's not showing up yet either even though the submission is reported as successful. I'll have to double-check my input parameters. ID: 40863 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1152 Credit: 11,734,920 RAC: 801	Message 40866 - Posted: 9 Dec 2019, 9:57:33 UTC - in response to Message 40863. Ah, I think I've found the reason. I'd been playing around with priorities to try to get around the problem we had with condor requests timing out, so all my recent jobs have been submitted with priority 1000. Federica's batches were submitted with the original template value of 600000(!). I submitted another batch at priority 100000 and it's appeared on WMStats, so it looks like the others I have sent are not being acted upon while the current batch is still running at the same priority. ID: 40866 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 300,040,167 RAC: 48,984	Message 40989 - Posted: 17 Dec 2019, 13:57:47 UTC Just a reminder. There are again no SixTrack WUs which results in a significantly higher number of CMS tasks being processed. => CMS may need fresh work earlier than expected. ID: 40989 · Reply Quote

computezrmle Volunteer moderator Volunteer developer Volunteer tester Help desk expert Send message Joined: 15 Jun 08 Posts: 2724 Credit: 300,040,167 RAC: 48,984	Message 41295 - Posted: 18 Jan 2020, 9:09:37 UTC Looks like there are no subtasks in the queue any more due to lots of hosts that switched over from SixTrack. Is anybody from the project team aware of this? ID: 41295 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1962 Credit: 159,289,840 RAC: 46,178	Message 41296 - Posted: 18 Jan 2020, 9:21:29 UTC - in response to Message 41295. Looks like there are no subtasks in the queue any more ... once again this leads me to the question whether the formerly installed automatic stop of the tasks queue in case of lack of jobs is no longer working. ID: 41296 · Reply Quote

NOGOOD Send message Joined: 18 Nov 17 Posts: 134 Credit: 59,088,615 RAC: 4,792	Message 41299 - Posted: 18 Jan 2020, 10:14:40 UTC - in response to Message 41296. Yes. This is very important question. ID: 41299 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1962 Credit: 159,289,840 RAC: 46,178	Message 41348 - Posted: 24 Jan 2020, 20:30:54 UTC again, there are no subtasks available since a few hours ago: https://lhcathome.cern.ch/lhcathome/result.php?resultid=259890562 https://lhcathome.cern.ch/lhcathome/result.php?resultid=259894933 https://lhcathome.cern.ch/lhcathome/result.php?resultid=259853027 ID: 41348 · Reply Quote

Jim1348 Send message Joined: 15 Nov 14 Posts: 602 Credit: 24,371,321 RAC: 0	Message 41350 - Posted: 24 Jan 2020, 22:30:58 UTC I am picking up a whole string of them too. Since they are short, I wouldn't mind so much if there were a few good ones to work on. But when they are all bad, maybe I should work on WCG. ID: 41350 · Reply Quote

rromanchuk Send message Joined: 11 Jan 20 Posts: 1 Credit: 279,839 RAC: 0	Message 41351 - Posted: 25 Jan 2020, 0:54:05 UTC - in response to Message 41350. 100% failure here too https://lhcathome.cern.ch/lhcathome/result.php?resultid=259953906 ID: 41351 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1962 Credit: 159,289,840 RAC: 46,178	Message 41352 - Posted: 25 Jan 2020, 6:27:54 UTC still, although mentioned here before, 2 question are unanswered: 1) why is this mechanism no longer working which should stop the tasks download queue as soon as there are no sub-tasks available? 2) is Ivan no longer on bord? Before, when problems like the current one came up, he always was very helpful in solving such and also other problems concerning CMS. Now, obviously, this is no longer the case :-( ID: 41352 · Reply Quote