Message boards : CMS Application : tasks now running unusual long time without CPU usage
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Erich56

Send message
Joined: 18 Dec 15
Posts: 1690
Credit: 104,018,771
RAC: 122,246
Message 49427 - Posted: 6 Feb 2024, 18:30:00 UTC - in response to Message 49426.  

Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu)
still running and using CPU ?
ID: 49427 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1280
Credit: 8,496,817
RAC: 2,374
Message 49428 - Posted: 6 Feb 2024, 18:32:00 UTC - in response to Message 49427.  

Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu)
still running and using CPU ?
JaWohl ;)
ID: 49428 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2104
Credit: 159,819,191
RAC: 123,837
Message 49429 - Posted: 6 Feb 2024, 18:36:30 UTC - in response to Message 49428.  
Last modified: 6 Feb 2024, 18:38:32 UTC

My task from 8:30 running now 10 hour.
2024-02-06 12:51 1.4M
2024-02-06 14:13 1.4M
2024-02-06 15:46 1.4M
2024-02-06 17:07 1.4M
Four jobs inside have finished, last one at 16 UTC. Waiting for the stop 11 hour or 18 hour.
When a CMS-Task is killed before hard stop, restarting from CMS for this jobs with an other CMS-Task?
ID: 49429 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1280
Credit: 8,496,817
RAC: 2,374
Message 49430 - Posted: 6 Feb 2024, 18:43:43 UTC - in response to Message 49429.  

Four jobs inside have finished, last one at 16 UTC. Waiting for the stop 11 hour or 18 hour.
When a CMS-Task is killed before hard stop, restarting from CMS for this jobs with an other CMS-Task?
The four jobs are uploaded to CMS-server and save. No resend needed. Killing the task means no credits.
ID: 49430 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1690
Credit: 104,018,771
RAC: 122,246
Message 49432 - Posted: 6 Feb 2024, 19:56:19 UTC - in response to Message 49426.  

Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu)
You were more lucky than I was: The task I started did not error out like the others before, but obviousla there were not jobs available. The task finished after 29 minutes with just about 1 minute CPU:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=405220625
ID: 49432 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1690
Credit: 104,018,771
RAC: 122,246
Message 49433 - Posted: 6 Feb 2024, 21:08:34 UTC - in response to Message 49432.  

Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu)
You were more lucky than I was: The task I started did not error out like the others before, but obviousla there were not jobs available. The task finished after 29 minutes with just about 1 minute CPU:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=405220625
in view of the fact that obviously no jobs are available, would't it make sense to stop distribution of tasks ?
ID: 49433 · Report as offensive     Reply Quote
Profile Magic Quantum Mechanic
Avatar

Send message
Joined: 24 Oct 04
Posts: 1128
Credit: 49,756,349
RAC: 8,313
Message 49434 - Posted: 7 Feb 2024, 0:21:56 UTC - in response to Message 49405.  
Last modified: 7 Feb 2024, 0:23:28 UTC

Starting last night mine started getting stuck at (here and at -dev)

and end up https://lhcathome.cern.ch/lhcathome/result.php?resultid=405216500
ID: 49434 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2104
Credit: 159,819,191
RAC: 123,837
Message 49436 - Posted: 7 Feb 2024, 2:34:43 UTC - in response to Message 49429.  

Waiting for the stop 11 hour or 18 hour.

Ok, was 18 hours.
Laufzeit 18 Stunden 0 min. 56 sek.
CPU Zeit 5 Stunden 7 min. 59 sek.
Prüfungsstatus Gültig
Punkte 213.55
ID: 49436 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1690
Credit: 104,018,771
RAC: 122,246
Message 49438 - Posted: 7 Feb 2024, 6:20:44 UTC - in response to Message 49433.  

Task is running 25 minutes and cmsRun inside VM 13 minutes now (95% Cpu)
You were more lucky than I was: The task I started did not error out like the others before, but obviousla there were not jobs available. The task finished after 29 minutes with just about 1 minute CPU:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=405220625
in view of the fact that obviously no jobs are available, would't it make sense to stop distribution of tasks ?
I tested again this morning, and still the tasks finished after about 25 minutes because no jobs could be downloaded. What I am wondering is that this time the automatic tasks submission stop does not work.
ID: 49438 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2104
Credit: 159,819,191
RAC: 123,837
Message 49439 - Posted: 7 Feb 2024, 7:28:12 UTC

We have to wait for an answer, had also made a test, ended after 25 min with 4 cobblestones.
ID: 49439 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1690
Credit: 104,018,771
RAC: 122,246
Message 49446 - Posted: 8 Feb 2024, 6:45:34 UTC - in response to Message 49439.  

as a test 10 minutes ago shows, there are still no jobs available, but tasks are still being distributed - the automatic stop of tasks distribution again does not work.

The result looks like this:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=405292196

On the project status page one can see 345 users within the past 24 hours (and almost 13.000 tasks being processed), so that many users have been crunching CMS tasks which run only in the "envelope" and finish after some 25 minutes, with NO VALUE at all for the science :-(

Holpefully this nonsense will be stopped ASAP.
ID: 49446 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2104
Credit: 159,819,191
RAC: 123,837
Message 49447 - Posted: 8 Feb 2024, 7:15:44 UTC - in response to Message 49446.  
Last modified: 8 Feb 2024, 7:21:39 UTC

Holpefully this nonsense will be stopped ASAP.

What, when this job's testing the functionality of the Tasks?
Do you know this!
This is the header:
tasks now running unusual long time without CPU usage
ID: 49447 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1690
Credit: 104,018,771
RAC: 122,246
Message 49448 - Posted: 8 Feb 2024, 7:29:49 UTC - in response to Message 49447.  

This is the header:
tasks now running unusual long time without CPU usage
the currently downloaded tasks do NOT run unusually long. They run only for about 25 minutes, using almost no CPU, because they do not receive jobs.
I put my comment from earlier this morning in here just because the current problems with CMS have been discussed here within the past few days. So I did not want to open a new thread.
ID: 49448 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2104
Credit: 159,819,191
RAC: 123,837
Message 49449 - Posted: 8 Feb 2024, 7:51:23 UTC - in response to Message 49439.  

We have to wait for an answer, had also made a test, ended after 25 min with 4 cobblestones.
ID: 49449 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,232
RAC: 315
Message 49456 - Posted: 8 Feb 2024, 11:07:44 UTC - in response to Message 49449.  

OK, we're running again, after a few glitches (and one error in a script, memory given in GiB where MiB was expected!). I think I'll resist any attempt at further tests until after the weekend...
ID: 49456 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2104
Credit: 159,819,191
RAC: 123,837
Message 49458 - Posted: 8 Feb 2024, 11:21:34 UTC - in response to Message 49456.  
Last modified: 8 Feb 2024, 11:23:11 UTC

Thank you CMS-Team.
This line is in log:
For details, see https://htcondor.org/news/plan-to-replace-gst-in-htcss/
but this line show HTCondor Software suite:
https://research.cs.wisc.edu/htcondor/news/plan-to-replace-gst-in-htcss/

First job inside the task is beginning.
ID: 49458 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1690
Credit: 104,018,771
RAC: 122,246
Message 49459 - Posted: 8 Feb 2024, 12:58:00 UTC - in response to Message 49456.  

OK, we're running again, after a few glitches (and one error in a script, memory given in GiB where MiB was expected!). I think I'll resist any attempt at further tests until after the weekend...
thanks, Ivan, for the information. The tasks now work fine.
Could you please make sure that the automatic "tasks-sending-stop-tool" (sorry for this strange name I invented for it) is also working again; just in case the next problem with jobs-submission will come up some time.
ID: 49459 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1006
Credit: 6,272,232
RAC: 315
Message 49461 - Posted: 8 Feb 2024, 14:23:05 UTC - in response to Message 49459.  

OK, we're running again, after a few glitches (and one error in a script, memory given in GiB where MiB was expected!). I think I'll resist any attempt at further tests until after the weekend...
thanks, Ivan, for the information. The tasks now work fine.
Could you please make sure that the automatic "tasks-sending-stop-tool" (sorry for this strange name I invented for it) is also working again; just in case the next problem with jobs-submission will come up some time.

OK, I'll mention that to Laurence, I didn't notice that it had in fact continued sending jobs when the pool was exhausted.
ID: 49461 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2104
Credit: 159,819,191
RAC: 123,837
Message 49462 - Posted: 8 Feb 2024, 15:22:24 UTC - in response to Message 49458.  

Thank you CMS-Team.
This line is in log:
For details, see https://htcondor.org/news/plan-to-replace-gst-in-htcss/
but this line show HTCondor Software suite:
https://research.cs.wisc.edu/htcondor/news/plan-to-replace-gst-in-htcss/

First job inside the task is beginning.


02/08/24 16:11:04 (pid:16707) condor_read(): Socket closed abnormally when trying to read 5 bytes from collector vocms0840.cern.ch, errno=104 Connection reset by peer
02/08/24 16:11:04 (pid:16707) CCBListener: failed to receive message from CCB server vocms0840.cern.ch

Only one Job inside the task, two hour ago finished.
ID: 49462 · Report as offensive     Reply Quote
Erich56

Send message
Joined: 18 Dec 15
Posts: 1690
Credit: 104,018,771
RAC: 122,246
Message 49464 - Posted: 8 Feb 2024, 15:44:57 UTC

Again I am noticing the same phenomenon as referred to in the title of this thread:
the new tasks have now been running longer than the ones from before (several days ago), still not finished, and no longer using CPU for quite some time.
Is this coincidence or is the same problem from day before yesterday back?
ID: 49464 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : CMS Application : tasks now running unusual long time without CPU usage


©2024 CERN