Message boards :
CMS Application :
Larger jobs in the pipeline
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,006,243 RAC: 8,727 |
Have 8 CMS under Windows running now 9 hours, last finished inside job at 1 UTC today, no more new inside job so long, but using still 6% of the CPU every CMS-Task. Is it possible to reduce the CPU in a waiting situation for a new inside Job? |
Send message Joined: 29 Aug 05 Posts: 1065 Credit: 7,924,926 RAC: 13,791 |
|
Send message Joined: 29 Aug 05 Posts: 1065 Credit: 7,924,926 RAC: 13,791 |
Have 8 CMS under Windows running now 9 hours, last finished inside job at 1 UTC today, I think I've seen this behaviour before, and a colleague is just reporting a similar case. In her case she has access to more information than I do. It seems that there was a disconnection between the glidein that runs the job and the condor pool. The pool thought that the job had been aborted and offered it to another host. When the glidein re-established contact with the pool it couldn't continue with the same job and entered a zombie state which will persist until the glidein times out. ...or something like that... We are trying to investigate ways around this situation. |
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,006,243 RAC: 8,727 |
Yes, but the message is: Running job output should appear here. Since 1 UTC today. |
Send message Joined: 29 Aug 05 Posts: 1065 Credit: 7,924,926 RAC: 13,791 |
|
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,006,243 RAC: 8,727 |
Yes, with the finished Job from 1 UTC (1,4 MByte). The second wmagent.log have this message: Running job output should appear here. For all 8 CMS-Tasks. Running time is 11:30 for the moment. |
Send message Joined: 29 Aug 05 Posts: 1065 Credit: 7,924,926 RAC: 13,791 |
|
Send message Joined: 2 May 07 Posts: 2245 Credit: 174,006,243 RAC: 8,727 |
Ok, 26 CMS are running ftm (because WCG have 14 days interrupt). When seeing this situation again, send you the Info's. |
Send message Joined: 29 Aug 05 Posts: 1065 Credit: 7,924,926 RAC: 13,791 |
If a task is still running, or if it happens again, could you please send us the contents of MasterLog, StartdLog and StarterLog from the log directory? I should add that this request also applies to anyone else who sees this behaviour, of BOINC tasks apparently idling without actually running a CMS job. Thanks, ivan and ff. |
Send message Joined: 14 Jan 10 Posts: 1429 Credit: 9,528,400 RAC: 4,037 |
Multiple WMAgent logs are now visible in the log directory for each job ("Show graphics" button). Thanks Ivan and Laurence. See here an example, where the one without a nr-extention is the current running job. wmagentJob.log 2022-02-17 22:30 5.9K wmagentJob_1.log 2022-02-17 14:30 20K wmagentJob_2.log 2022-02-17 15:30 20K wmagentJob_3.log 2022-02-17 16:38 20K wmagentJob_4.log 2022-02-17 17:51 20K wmagentJob_5.log 2022-02-17 19:08 20K wmagentJob_6.log 2022-02-17 20:14 20K wmagentJob_7.log 2022-02-17 21:21 20K wmagentJob_8.log 2022-02-17 22:30 20K |
©2025 CERN