Message boards :
CMS Application :
CMS jobs are becoming available again
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 18 Dec 15 Posts: 1687 Credit: 102,945,988 RAC: 125,510 |
now the queue is empty again - any new problems? |
Send message Joined: 14 Jan 10 Posts: 1272 Credit: 8,479,164 RAC: 2,361 |
now the queue is empty again - any new problems?No sub-jobs available. 2781 succeeded, 230 aborted, 162 app failed, 0 pending and 17 running |
Send message Joined: 29 Aug 05 Posts: 1004 Credit: 6,268,761 RAC: 316 |
|
Send message Joined: 13 Jul 05 Posts: 167 Credit: 14,938,551 RAC: 191 |
What do you mean by "CPU load"? Do you mean top's %cpu or do you mean the ratio of cpu time to run time. The only CMS task I've completed since the last update reported by Ivan showed a fairly constant ~99 %cpu in top but the ratio of cpu time to run time is 45,764.33/64,179.97 = 71%. But I think that your 71% is also a reflection of the wall-clock time taken up by the VM filling its CVMFS cache and so on, i.e. network traffic, rather than just CPU efficiency in the number crunching phase proper. There again, all my CMS tasks failed on a heartbeat error :( even though the same machines run Theory and Atlas VMs quite happily. |
Send message Joined: 18 Dec 15 Posts: 1687 Credit: 102,945,988 RAC: 125,510 |
Although the Server Status page shows 197 tasks available for download, none of my machines can download any. BOINC says "no tasks available for CMS simulation". BTW: same is true for Theory. What's going on? |
Send message Joined: 29 Aug 05 Posts: 1004 Credit: 6,268,761 RAC: 316 |
Although the Server Status page shows 197 tasks available for download, none of my machines can download any. Do you already have tasks running? CMS won't download tasks unless there is a BOINC slot for them (i.e. you shouldn't get any tasks in the "Waiting to run" state. You can turn on extra logging in your cc_config.xml file if you want to probe more deeply -- https://boinc.berkeley.edu/wiki/Client_configuration. |
Send message Joined: 27 Sep 08 Posts: 801 Credit: 649,848,958 RAC: 240,644 |
I think something changed server side, I see my 44 core machines draining there queues down with the message that "This computer has reached a limit on tasks in progress". In the past they would buffer 50 WU so they were fully stocked and had 6 spare. See logging: 4457 LHC@home 03/18/19 22:34:25 [work_fetch] REC 33006.218 prio -1.282 can request work 4460 03/18/19 22:34:25 [work_fetch] --- state for CPU --- 4461 03/18/19 22:34:25 [work_fetch] shortfall 1453962.91 nidle 18.00 saturated 0.00 busy 0.00 4467 LHC@home 03/18/19 22:34:25 [work_fetch] share 1.000 4471 LHC@home 03/18/19 22:34:25 [work_fetch] set_request() for CPU: ninst 44 nused_total 26.00 nidle_now 18.00 fetch share 1.00 req_inst 18.00 req_secs 1453962.91 4472 LHC@home 03/18/19 22:34:25 [work_fetch] set_request() for AMD/ATI GPU: ninst 1 nused_total 0.00 nidle_now 1.00 fetch share 1.00 req_inst 1.00 req_secs 44064.00 4473 LHC@home 03/18/19 22:34:25 [work_fetch] request: CPU (1453962.91 sec, 18.00 inst) AMD/ATI GPU (44064.00 sec, 1.00 inst) 4474 LHC@home 03/18/19 22:34:25 Sending scheduler request: To fetch work. 4475 LHC@home 03/18/19 22:34:25 Requesting new tasks for CPU and AMD/ATI GPU 4476 LHC@home 03/18/19 22:34:26 Scheduler request completed: got 0 new tasks 4477 LHC@home 03/18/19 22:34:26 No tasks sent 4478 LHC@home 03/18/19 22:34:26 No tasks are available for SixTrack 4479 LHC@home 03/18/19 22:34:26 No tasks are available for sixtracktest 4480 LHC@home 03/18/19 22:34:26 No tasks are available for CMS Simulation 4481 LHC@home 03/18/19 22:34:26 No tasks are available for Theory Simulation 4482 LHC@home 03/18/19 22:34:26 This computer has reached a limit on tasks in progress it looks like it request 18 WU but there is a limit so doesn't get anything. Not sure what flag shows server responses? Maybe there is some limit to how many jobs you can run per day? This computer took 55 task today. |
Send message Joined: 29 Aug 05 Posts: 1004 Credit: 6,268,761 RAC: 316 |
I think something changed server side, I see my 44 core machines draining there queues down with the message that "This computer has reached a limit on tasks in progress". In the past they would buffer 50 WU so they were fully stocked and had 6 spare. There is a limit to how many tasks you can queue, but I'm not sure how it's implemented in LHC@Home -- whether it's a limit per project or an overall limit. I know at SETI@Home the limit is 100 per PC (regardless of how many CPUs) plus 100 per GPU. There is also a daily quota, usually too large to be noticed, Errored or aborted tasks will incrementally reduce your quota, down to one task per day; conversely valid tasks increment your quota up to the machine limit. There is usually an output message that the machine has reached its quota of N per day when this happens (I got that when I aborted a bunch of tasks the other day, and it took a little while before I could get more tasks). |
Send message Joined: 18 Dec 15 Posts: 1687 Credit: 102,945,988 RAC: 125,510 |
there were definitely some changes made on server -side. So far, I was able to run as many subproject-jobs as my 6+6(HT)cores CPU allowed (32 GB RAM was never a problem, except for ATLAS). So, for example, I had 8 Theory or 8 CMS tasks running simultaneously, plus a few more in the waiting queue. Currently, there runs 1 CMS.+ 1 Theory. Although my web-settings say "max. number of tasks: 6", plus my app_config.xml are set to run 3 CMS and 3 Theory tasks, I cannot download any more either CMS or Theory tasks. The BOINC log says: 19.03.2019 06:50:37 | LHC@home | update requested by user 19.03.2019 06:50:42 | LHC@home | Sending scheduler request: Requested by user. 19.03.2019 06:50:42 | LHC@home | Requesting new tasks for CPU 19.03.2019 06:50:43 | LHC@home | Scheduler request completed: got 0 new tasks 19.03.2019 06:50:43 | LHC@home | No tasks sent 19.03.2019 06:50:43 | LHC@home | No tasks are available for CMS Simulation 19.03.2019 06:50:43 | LHC@home | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them 19.03.2019 06:50:43 | LHC@home | This computer has reached a limit on tasks in progress 19.03.2019 06:50:54 | LHC@home | Sending scheduler request: To fetch work. 19.03.2019 06:50:54 | LHC@home | Requesting new tasks for CPU 19.03.2019 06:50:55 | LHC@home | Scheduler request completed: got 0 new tasks 19.03.2019 06:50:55 | LHC@home | No tasks sent 19.03.2019 06:50:55 | LHC@home | No tasks are available for CMS Simulation 19.03.2019 06:50:55 | LHC@home | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them 19.03.2019 06:50:55 | LHC@home | This computer has reached a limit on tasks in progress 19.03.2019 07:01:26 | LHC@home | update requested by user 19.03.2019 07:01:28 | LHC@home | Sending scheduler request: Requested by user. 19.03.2019 07:01:28 | LHC@home | Requesting new tasks for CPU 19.03.2019 07:01:29 | LHC@home | Scheduler request completed: got 0 new tasks 19.03.2019 07:01:29 | LHC@home | No tasks sent 19.03.2019 07:01:29 | LHC@home | No tasks are available for Theory Simulation 19.03.2019 07:01:29 | LHC@home | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them 19.03.2019 07:01:29 | LHC@home | This computer has reached a limit on tasks in progress 19.03.2019 07:01:39 | LHC@home | Sending scheduler request: To fetch work. 19.03.2019 07:01:39 | LHC@home | Requesting new tasks for CPU 19.03.2019 07:01:40 | LHC@home | Scheduler request completed: got 0 new tasks 19.03.2019 07:01:40 | LHC@home | No tasks sent 19.03.2019 07:01:40 | LHC@home | No tasks are available for Theory Simulation 19.03.2019 07:01:40 | LHC@home | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them 19.03.2019 07:01:40 | LHC@home | This computer has reached a limit on tasks in progress what irritates me is what Toby Broom was mentioning already: "This computer has reached a limit on tasks in progress" - how come? Who sets this limit of tasks in progress? All my settings are "6", on the webpage as well in the app_config.xml ! So, something is definitely running wrong somewhere server-side all of a sudden, since yesterday. Ivan, could you please look into this? |
Send message Joined: 27 Sep 08 Posts: 801 Credit: 649,848,958 RAC: 240,644 |
This morning the PC is down to 4 tasks. I can think that now unlimited = 1 Job, this has been configured in the past, so I assume the introduction of native Theory has reset this config. Since I run ATLAS in a separate BOINC session it seems that this project is not effected, still running 12 at once. looking at Erich's results is would appear the Job system is totally broke as he has forced 6 Jobs and only gets 1. Lawrence should take a look at the settings. |
Send message Joined: 18 Dec 15 Posts: 1687 Credit: 102,945,988 RAC: 125,510 |
it would appear the Job system is totally brokethis is most probably the case :-( |
Send message Joined: 29 Aug 05 Posts: 1004 Credit: 6,268,761 RAC: 316 |
|
Send message Joined: 27 Sep 08 Posts: 801 Credit: 649,848,958 RAC: 240,644 |
My PC finished the remaining Wu's, now it got 1 task for CMS and one for Theory. I imagine Lawrence set it to one for the new project so if there is problems it doesn't cause too much damage but it's change CMS and the regular Theory projects. |
Send message Joined: 14 Jan 10 Posts: 1272 Credit: 8,479,164 RAC: 2,361 |
When I set Max # jobs No limit Max # CPUs No limitI get a max of 2 tasks / core (tested with Theory Native), but I think this is not wanted for the multi-core running applications CMS and ATLAS. |
Send message Joined: 18 Dec 15 Posts: 1687 Credit: 102,945,988 RAC: 125,510 |
I am curious when this problem will be repaired |
Send message Joined: 27 Sep 08 Posts: 801 Credit: 649,848,958 RAC: 240,644 |
I'm not sure where the 50 Job limits came from I just worked that out from testing, I assume there isn't many people that run 50 tasks at once. My settings are: Max # jobs No limit Max # CPUs 1 I think one task/job per core is a fine limit. I changed my settings to match yours, now I get more tasks/jobs The reason I use the Max # CPUs 1, is that the ram calculation from BOINC is not correct when set to no limit. e.g No limit theory task takes 32Cores with a ram usage of 3.06GB, vs a 1 core theory task takes WS 0.74GB. A 32Core WU is nonsense I can imagine? Option #1 I can dial back the number of cores with app_config but now the working set is wrong by 4x, so if I have 44 cores BOINC thinks I need 134GB of ram so will not run 44 tasks/jobs, where as in reality I use 33GB to run 44 cores. Option #2 I have to use less #CPU setting, which as we know at 1 limits the number of Jobs to 1. I could have 8 which would give me maybe 16 WU's Since there is no multi-core CMS then it runs fine now with No Limit settings. |
Send message Joined: 29 Aug 05 Posts: 1004 Credit: 6,268,761 RAC: 316 |
|
Send message Joined: 29 Aug 05 Posts: 1004 Credit: 6,268,761 RAC: 316 |
|
Send message Joined: 28 Sep 04 Posts: 675 Credit: 43,524,182 RAC: 15,592 |
The CMS jobs graphs are failing both here and at dev. |
Send message Joined: 29 Aug 05 Posts: 1004 Credit: 6,268,761 RAC: 316 |
|
©2024 CERN