Message boards : CMS Application : Larger jobs in the pipeline
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,825,946
RAC: 126,738
Message 46260 - Posted: 17 Feb 2022, 9:39:37 UTC - in response to Message 46215.  

Have 8 CMS under Windows running now 9 hours, last finished inside job at 1 UTC today,
no more new inside job so long, but using still 6% of the CPU every CMS-Task.
Is it possible to reduce the CPU in a waiting situation for a new inside Job?
ID: 46260 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1005
Credit: 6,269,877
RAC: 404
Message 46261 - Posted: 17 Feb 2022, 9:49:19 UTC - in response to Message 46215.  

Multiple WMAgent logs are now visible in the log directory for each job ("Show graphics" button).
ID: 46261 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1005
Credit: 6,269,877
RAC: 404
Message 46263 - Posted: 17 Feb 2022, 10:15:44 UTC - in response to Message 46260.  

Have 8 CMS under Windows running now 9 hours, last finished inside job at 1 UTC today,
no more new inside job so long, but using still 6% of the CPU every CMS-Task.
Is it possible to reduce the CPU in a waiting situation for a new inside Job?

I think I've seen this behaviour before, and a colleague is just reporting a similar case. In her case she has access to more information than I do. It seems that there was a disconnection between the glidein that runs the job and the condor pool. The pool thought that the job had been aborted and offered it to another host. When the glidein re-established contact with the pool it couldn't continue with the same job and entered a zombie state which will persist until the glidein times out. ...or something like that...
We are trying to investigate ways around this situation.
ID: 46263 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,825,946
RAC: 126,738
Message 46264 - Posted: 17 Feb 2022, 10:28:02 UTC - in response to Message 46261.  

Yes,
but the message is:
Running job output should appear here.
Since 1 UTC today.
ID: 46264 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1005
Credit: 6,269,877
RAC: 404
Message 46265 - Posted: 17 Feb 2022, 11:12:50 UTC - in response to Message 46264.  

Yes,
but the message is:
Running job output should appear here.
Since 1 UTC today.

That's the default initial finished_0.log. Do you have a wmagentJob.log?
ID: 46265 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,825,946
RAC: 126,738
Message 46266 - Posted: 17 Feb 2022, 11:34:36 UTC - in response to Message 46265.  
Last modified: 17 Feb 2022, 11:36:53 UTC

Yes, with the finished Job from 1 UTC (1,4 MByte).
The second wmagent.log have this message:
Running job output should appear here.
For all 8 CMS-Tasks. Running time is 11:30 for the moment.
ID: 46266 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1005
Credit: 6,269,877
RAC: 404
Message 46269 - Posted: 17 Feb 2022, 15:09:40 UTC - in response to Message 46266.  
Last modified: 17 Feb 2022, 15:10:27 UTC

If a task is still running, or if it happens again, could you please send us the contents of MasterLog, StartdLog and StarterLog from the log directory?
ivan (dot) reid (at) brunel (dot) ac (dot) uk
ID: 46269 · Report as offensive     Reply Quote
maeax

Send message
Joined: 2 May 07
Posts: 2090
Credit: 158,825,946
RAC: 126,738
Message 46270 - Posted: 17 Feb 2022, 17:53:16 UTC - in response to Message 46269.  

Ok,
26 CMS are running ftm (because WCG have 14 days interrupt).
When seeing this situation again, send you the Info's.
ID: 46270 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1005
Credit: 6,269,877
RAC: 404
Message 46271 - Posted: 17 Feb 2022, 21:07:07 UTC - in response to Message 46269.  

If a task is still running, or if it happens again, could you please send us the contents of MasterLog, StartdLog and StarterLog from the log directory?
ivan (dot) reid (at) brunel (dot) ac (dot) uk

I should add that this request also applies to anyone else who sees this behaviour, of BOINC tasks apparently idling without actually running a CMS job. Thanks, ivan and ff.
ID: 46271 · Report as offensive     Reply Quote
Crystal Pellet
Volunteer moderator
Volunteer tester

Send message
Joined: 14 Jan 10
Posts: 1273
Credit: 8,480,147
RAC: 2,155
Message 46272 - Posted: 17 Feb 2022, 21:36:18 UTC - in response to Message 46261.  

Multiple WMAgent logs are now visible in the log directory for each job ("Show graphics" button).

Thanks Ivan and Laurence.
See here an example, where the one without a nr-extention is the current running job.
wmagentJob.log		2022-02-17 22:30	5.9K	 
wmagentJob_1.log	2022-02-17 14:30	20K	 
wmagentJob_2.log	2022-02-17 15:30	20K	 
wmagentJob_3.log	2022-02-17 16:38	20K	 
wmagentJob_4.log	2022-02-17 17:51	20K	 
wmagentJob_5.log	2022-02-17 19:08	20K	 
wmagentJob_6.log	2022-02-17 20:14	20K	 
wmagentJob_7.log	2022-02-17 21:21	20K	 
wmagentJob_8.log	2022-02-17 22:30	20K
ID: 46272 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : CMS Application : Larger jobs in the pipeline


©2024 CERN