Thread 'tasks now running unusual long time without CPU usage'

Author	Message
Erich56 Send message Joined: 18 Dec 15 Posts: 1977 Credit: 159,915,847 RAC: 50,477	Message 49427 - Posted: 6 Feb 2024, 18:30:00 UTC - in response to Message 49426. Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu) still running and using CPU ? ID: 49427 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1547 Credit: 10,062,820 RAC: 1,586	Message 49428 - Posted: 6 Feb 2024, 18:32:00 UTC - in response to Message 49427. Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu) still running and using CPU ? JaWohl ;) ID: 49428 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2291 Credit: 178,882,996 RAC: 2,768	Message 49429 - Posted: 6 Feb 2024, 18:36:30 UTC - in response to Message 49428. Last modified: 6 Feb 2024, 18:38:32 UTC My task from 8:30 running now 10 hour. 2024-02-06 12:51 1.4M 2024-02-06 14:13 1.4M 2024-02-06 15:46 1.4M 2024-02-06 17:07 1.4M Four jobs inside have finished, last one at 16 UTC. Waiting for the stop 11 hour or 18 hour. When a CMS-Task is killed before hard stop, restarting from CMS for this jobs with an other CMS-Task? ID: 49429 · Reply Quote

Crystal Pellet Volunteer moderator Volunteer tester Send message Joined: 14 Jan 10 Posts: 1547 Credit: 10,062,820 RAC: 1,586	Message 49430 - Posted: 6 Feb 2024, 18:43:43 UTC - in response to Message 49429. Four jobs inside have finished, last one at 16 UTC. Waiting for the stop 11 hour or 18 hour. When a CMS-Task is killed before hard stop, restarting from CMS for this jobs with an other CMS-Task? The four jobs are uploaded to CMS-server and save. No resend needed. Killing the task means no credits. ID: 49430 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1977 Credit: 159,915,847 RAC: 50,477	Message 49432 - Posted: 6 Feb 2024, 19:56:19 UTC - in response to Message 49426. Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu) You were more lucky than I was: The task I started did not error out like the others before, but obviousla there were not jobs available. The task finished after 29 minutes with just about 1 minute CPU: https://lhcathome.cern.ch/lhcathome/result.php?resultid=405220625 ID: 49432 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1977 Credit: 159,915,847 RAC: 50,477	Message 49433 - Posted: 6 Feb 2024, 21:08:34 UTC - in response to Message 49432. Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu) You were more lucky than I was: The task I started did not error out like the others before, but obviousla there were not jobs available. The task finished after 29 minutes with just about 1 minute CPU: https://lhcathome.cern.ch/lhcathome/result.php?resultid=405220625 in view of the fact that obviously no jobs are available, would't it make sense to stop distribution of tasks ? ID: 49433 · Reply Quote

Magic Quantum Mechanic Send message Joined: 24 Oct 04 Posts: 1298 Credit: 95,423,499 RAC: 20,878	Message 49434 - Posted: 7 Feb 2024, 0:21:56 UTC - in response to Message 49405. Last modified: 7 Feb 2024, 0:23:28 UTC Starting last night mine started getting stuck at (here and at -dev) and end up https://lhcathome.cern.ch/lhcathome/result.php?resultid=405216500 ID: 49434 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2291 Credit: 178,882,996 RAC: 2,768	Message 49436 - Posted: 7 Feb 2024, 2:34:43 UTC - in response to Message 49429. Waiting for the stop 11 hour or 18 hour. Ok, was 18 hours. Laufzeit 18 Stunden 0 min. 56 sek. CPU Zeit 5 Stunden 7 min. 59 sek. PrÃ¼fungsstatus GÃ¼ltig Punkte 213.55 ID: 49436 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1977 Credit: 159,915,847 RAC: 50,477	Message 49438 - Posted: 7 Feb 2024, 6:20:44 UTC - in response to Message 49433. Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu) You were more lucky than I was: The task I started did not error out like the others before, but obviousla there were not jobs available. The task finished after 29 minutes with just about 1 minute CPU: https://lhcathome.cern.ch/lhcathome/result.php?resultid=405220625 in view of the fact that obviously no jobs are available, would't it make sense to stop distribution of tasks ? I tested again this morning, and still the tasks finished after about 25 minutes because no jobs could be downloaded. What I am wondering is that this time the automatic tasks submission stop does not work. ID: 49438 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2291 Credit: 178,882,996 RAC: 2,768	Message 49439 - Posted: 7 Feb 2024, 7:28:12 UTC We have to wait for an answer, had also made a test, ended after 25 min with 4 cobblestones. ID: 49439 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1977 Credit: 159,915,847 RAC: 50,477	Message 49446 - Posted: 8 Feb 2024, 6:45:34 UTC - in response to Message 49439. as a test 10 minutes ago shows, there are still no jobs available, but tasks are still being distributed - the automatic stop of tasks distribution again does not work. The result looks like this: https://lhcathome.cern.ch/lhcathome/result.php?resultid=405292196 On the project status page one can see 345 users within the past 24 hours (and almost 13.000 tasks being processed), so that many users have been crunching CMS tasks which run only in the "envelope" and finish after some 25 minutes, with NO VALUE at all for the science :-( Holpefully this nonsense will be stopped ASAP. ID: 49446 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2291 Credit: 178,882,996 RAC: 2,768	Message 49447 - Posted: 8 Feb 2024, 7:15:44 UTC - in response to Message 49446. Last modified: 8 Feb 2024, 7:21:39 UTC Holpefully this nonsense will be stopped ASAP. What, when this job's testing the functionality of the Tasks? Do you know this! This is the header: tasks now running unusual long time without CPU usage ID: 49447 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1977 Credit: 159,915,847 RAC: 50,477	Message 49448 - Posted: 8 Feb 2024, 7:29:49 UTC - in response to Message 49447. This is the header: tasks now running unusual long time without CPU usage the currently downloaded tasks do NOT run unusually long. They run only for about 25 minutes, using almost no CPU, because they do not receive jobs. I put my comment from earlier this morning in here just because the current problems with CMS have been discussed here within the past few days. So I did not want to open a new thread. ID: 49448 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2291 Credit: 178,882,996 RAC: 2,768	Message 49449 - Posted: 8 Feb 2024, 7:51:23 UTC - in response to Message 49439. We have to wait for an answer, had also made a test, ended after 25 min with 4 cobblestones. ID: 49449 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1153 Credit: 11,734,920 RAC: 220	Message 49456 - Posted: 8 Feb 2024, 11:07:44 UTC - in response to Message 49449. OK, we're running again, after a few glitches (and one error in a script, memory given in GiB where MiB was expected!). I think I'll resist any attempt at further tests until after the weekend... ID: 49456 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2291 Credit: 178,882,996 RAC: 2,768	Message 49458 - Posted: 8 Feb 2024, 11:21:34 UTC - in response to Message 49456. Last modified: 8 Feb 2024, 11:23:11 UTC Thank you CMS-Team. This line is in log: For details, see https://htcondor.org/news/plan-to-replace-gst-in-htcss/ but this line show HTCondor Software suite: https://research.cs.wisc.edu/htcondor/news/plan-to-replace-gst-in-htcss/ First job inside the task is beginning. ID: 49458 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1977 Credit: 159,915,847 RAC: 50,477	Message 49459 - Posted: 8 Feb 2024, 12:58:00 UTC - in response to Message 49456. OK, we're running again, after a few glitches (and one error in a script, memory given in GiB where MiB was expected!). I think I'll resist any attempt at further tests until after the weekend... thanks, Ivan, for the information. The tasks now work fine. Could you please make sure that the automatic "tasks-sending-stop-tool" (sorry for this strange name I invented for it) is also working again; just in case the next problem with jobs-submission will come up some time. ID: 49459 · Reply Quote

ivan Volunteer moderator Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 29 Aug 05 Posts: 1153 Credit: 11,734,920 RAC: 220	Message 49461 - Posted: 8 Feb 2024, 14:23:05 UTC - in response to Message 49459. OK, we're running again, after a few glitches (and one error in a script, memory given in GiB where MiB was expected!). I think I'll resist any attempt at further tests until after the weekend... thanks, Ivan, for the information. The tasks now work fine. Could you please make sure that the automatic "tasks-sending-stop-tool" (sorry for this strange name I invented for it) is also working again; just in case the next problem with jobs-submission will come up some time. OK, I'll mention that to Laurence, I didn't notice that it had in fact continued sending jobs when the pool was exhausted. ID: 49461 · Reply Quote

maeax Send message Joined: 2 May 07 Posts: 2291 Credit: 178,882,996 RAC: 2,768	Message 49462 - Posted: 8 Feb 2024, 15:22:24 UTC - in response to Message 49458. Thank you CMS-Team. This line is in log: For details, see https://htcondor.org/news/plan-to-replace-gst-in-htcss/ but this line show HTCondor Software suite: https://research.cs.wisc.edu/htcondor/news/plan-to-replace-gst-in-htcss/ First job inside the task is beginning. 02/08/24 16:11:04 (pid:16707) condor_read(): Socket closed abnormally when trying to read 5 bytes from collector vocms0840.cern.ch, errno=104 Connection reset by peer 02/08/24 16:11:04 (pid:16707) CCBListener: failed to receive message from CCB server vocms0840.cern.ch Only one Job inside the task, two hour ago finished. ID: 49462 · Reply Quote

Erich56 Send message Joined: 18 Dec 15 Posts: 1977 Credit: 159,915,847 RAC: 50,477	Message 49464 - Posted: 8 Feb 2024, 15:44:57 UTC Again I am noticing the same phenomenon as referred to in the title of this thread: the new tasks have now been running longer than the ones from before (several days ago), still not finished, and no longer using CPU for quite some time. Is this coincidence or is the same problem from day before yesterday back? ID: 49464 · Reply Quote