Message boards :
CMS Application :
tasks now running unusual long time without CPU usage
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1690 Credit: 104,072,581 RAC: 122,144 |
Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu)still running and using CPU ? |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,817 RAC: 2,374 |
JaWohl ;)Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu)still running and using CPU ? |
Send message Joined: 2 May 07 Posts: 2104 Credit: 159,819,191 RAC: 123,837 |
My task from 8:30 running now 10 hour. 2024-02-06 12:51 1.4M 2024-02-06 14:13 1.4M 2024-02-06 15:46 1.4M 2024-02-06 17:07 1.4M Four jobs inside have finished, last one at 16 UTC. Waiting for the stop 11 hour or 18 hour. When a CMS-Task is killed before hard stop, restarting from CMS for this jobs with an other CMS-Task? |
Send message Joined: 14 Jan 10 Posts: 1280 Credit: 8,496,817 RAC: 2,374 |
Four jobs inside have finished, last one at 16 UTC. Waiting for the stop 11 hour or 18 hour.The four jobs are uploaded to CMS-server and save. No resend needed. Killing the task means no credits. |
Send message Joined: 18 Dec 15 Posts: 1690 Credit: 104,072,581 RAC: 122,144 |
Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu)You were more lucky than I was: The task I started did not error out like the others before, but obviousla there were not jobs available. The task finished after 29 minutes with just about 1 minute CPU: https://lhcathome.cern.ch/lhcathome/result.php?resultid=405220625 |
Send message Joined: 18 Dec 15 Posts: 1690 Credit: 104,072,581 RAC: 122,144 |
in view of the fact that obviously no jobs are available, would't it make sense to stop distribution of tasks ?Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu)You were more lucky than I was: The task I started did not error out like the others before, but obviousla there were not jobs available. The task finished after 29 minutes with just about 1 minute CPU: |
Send message Joined: 24 Oct 04 Posts: 1128 Credit: 49,757,934 RAC: 8,198 |
Starting last night mine started getting stuck at (here and at -dev) and end up https://lhcathome.cern.ch/lhcathome/result.php?resultid=405216500 |
Send message Joined: 2 May 07 Posts: 2104 Credit: 159,819,191 RAC: 123,837 |
Waiting for the stop 11 hour or 18 hour. Ok, was 18 hours. Laufzeit 18 Stunden 0 min. 56 sek. CPU Zeit 5 Stunden 7 min. 59 sek. Prüfungsstatus Gültig Punkte 213.55 |
Send message Joined: 18 Dec 15 Posts: 1690 Credit: 104,072,581 RAC: 122,144 |
I tested again this morning, and still the tasks finished after about 25 minutes because no jobs could be downloaded. What I am wondering is that this time the automatic tasks submission stop does not work.in view of the fact that obviously no jobs are available, would't it make sense to stop distribution of tasks ?Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu)You were more lucky than I was: The task I started did not error out like the others before, but obviousla there were not jobs available. The task finished after 29 minutes with just about 1 minute CPU: |
Send message Joined: 2 May 07 Posts: 2104 Credit: 159,819,191 RAC: 123,837 |
We have to wait for an answer, had also made a test, ended after 25 min with 4 cobblestones. |
Send message Joined: 18 Dec 15 Posts: 1690 Credit: 104,072,581 RAC: 122,144 |
as a test 10 minutes ago shows, there are still no jobs available, but tasks are still being distributed - the automatic stop of tasks distribution again does not work. The result looks like this: https://lhcathome.cern.ch/lhcathome/result.php?resultid=405292196 On the project status page one can see 345 users within the past 24 hours (and almost 13.000 tasks being processed), so that many users have been crunching CMS tasks which run only in the "envelope" and finish after some 25 minutes, with NO VALUE at all for the science :-( Holpefully this nonsense will be stopped ASAP. |
Send message Joined: 2 May 07 Posts: 2104 Credit: 159,819,191 RAC: 123,837 |
Holpefully this nonsense will be stopped ASAP. What, when this job's testing the functionality of the Tasks? Do you know this! This is the header: tasks now running unusual long time without CPU usage |
Send message Joined: 18 Dec 15 Posts: 1690 Credit: 104,072,581 RAC: 122,144 |
This is the header:the currently downloaded tasks do NOT run unusually long. They run only for about 25 minutes, using almost no CPU, because they do not receive jobs. I put my comment from earlier this morning in here just because the current problems with CMS have been discussed here within the past few days. So I did not want to open a new thread. |
Send message Joined: 2 May 07 Posts: 2104 Credit: 159,819,191 RAC: 123,837 |
We have to wait for an answer, had also made a test, ended after 25 min with 4 cobblestones. |
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,232 RAC: 315 |
|
Send message Joined: 2 May 07 Posts: 2104 Credit: 159,819,191 RAC: 123,837 |
Thank you CMS-Team. This line is in log: For details, see https://htcondor.org/news/plan-to-replace-gst-in-htcss/ but this line show HTCondor Software suite: https://research.cs.wisc.edu/htcondor/news/plan-to-replace-gst-in-htcss/ First job inside the task is beginning. |
Send message Joined: 18 Dec 15 Posts: 1690 Credit: 104,072,581 RAC: 122,144 |
OK, we're running again, after a few glitches (and one error in a script, memory given in GiB where MiB was expected!). I think I'll resist any attempt at further tests until after the weekend...thanks, Ivan, for the information. The tasks now work fine. Could you please make sure that the automatic "tasks-sending-stop-tool" (sorry for this strange name I invented for it) is also working again; just in case the next problem with jobs-submission will come up some time. |
Send message Joined: 29 Aug 05 Posts: 1006 Credit: 6,272,232 RAC: 315 |
OK, we're running again, after a few glitches (and one error in a script, memory given in GiB where MiB was expected!). I think I'll resist any attempt at further tests until after the weekend...thanks, Ivan, for the information. The tasks now work fine. OK, I'll mention that to Laurence, I didn't notice that it had in fact continued sending jobs when the pool was exhausted. |
Send message Joined: 2 May 07 Posts: 2104 Credit: 159,819,191 RAC: 123,837 |
Thank you CMS-Team. 02/08/24 16:11:04 (pid:16707) condor_read(): Socket closed abnormally when trying to read 5 bytes from collector vocms0840.cern.ch, errno=104 Connection reset by peer 02/08/24 16:11:04 (pid:16707) CCBListener: failed to receive message from CCB server vocms0840.cern.ch Only one Job inside the task, two hour ago finished. |
Send message Joined: 18 Dec 15 Posts: 1690 Credit: 104,072,581 RAC: 122,144 |
Again I am noticing the same phenomenon as referred to in the title of this thread: the new tasks have now been running longer than the ones from before (several days ago), still not finished, and no longer using CPU for quite some time. Is this coincidence or is the same problem from day before yesterday back? |
©2024 CERN