Message boards :
CMS Application :
New CMS job graphs
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 298 |
I'm trying to work out how to let everyone see the new Dashboard (Grafana) job plots. If you don't have CERN credentials, it appears that you can obtain limited credentials if you are a member of a specified list of organisations or certain public services such as Facebook and Google. Try to access the plots via my test page: https://www.brunel.ac.uk/~eesridr/cms_job.php. If you get a Grafana log-in page, select the CERN SSO option (single sign-on) and see if you can create your own permissions. Let me know if it works. If it's successful, I'll pass it on to Laurence to put on our web-site. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
Let me know if it works. Thanks. I used a google account and it works as described. Among lots of other information it shows how many jobs are completed/currently running. Does it also show how many are waiting in the queue (pending seems to have a different meaning)? |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
The given links to grafana set a fix timeframe, e.g. from 2019-07-31 14:55:16 to 2019-08-07 14:43:16 from=1564577716023&to=1565181796023 The following part of the links might be changed to show (e.g.) the last week until "now": from=now-7d&to=now |
Send message Joined: 24 Oct 04 Posts: 1176 Credit: 54,887,670 RAC: 5,761 |
( I think some seti critters have taken over my Hughes satellite so the speed is like a 1995 dialup right now) You would think that after almost 60 years we would have a satellite system that was a bit faster than this snail from space Hughes has as their top of the line Gen5 Well imagine that.... |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 298 |
The given links to grafana set a fix timeframe, e.g. Ah, thanks for catching that. I'll change it later -- have to go arrange the payment of my next 6 month's rent just now... |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 298 |
Among lots of other information it shows how many jobs are completed/currently running. I'll have to dig around to see if that's available -- as you say, Grafana seems to have a different definition of "pending" to what I see in WMStats. |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 298 |
|
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
Looking at the different grafana graphics here I'm a bit confused. #of completed jobs seems to be stable at a bit more than 50/h. #of running jobs seems to be stable around 150/h. Failure rate average is far less than 10%. What happened to roughly 100 jobs/h? |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 298 |
Looking at the different grafana graphics here I'm a bit confused. I changed the job duration a couple of weeks ago, from 40,000 to 100,000 events, as the average CPU time was under 1 hour (I couldn't see this easily until the new Dashboard was introduced). Hopefully this will improve efficiency -- more CPU for the same amount of "downtime" during jobs. You'll see the average CPU time is now pushing 1.5 hours (with lots of variation from hour to hour). I'm still looking for a "queued" or "pending" graph that matches what I see on WMStats. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
I agree: The "Average CPU time" graph shows values between 1 and 2 h. That's not what I meant. The "Running jobs" graph shows a (more or less) stable rate of 130-178 jobs/h between 2019-08-08 0:00 and 2019-08-10 12:00. If all of them would need 2 h to complete I would expect the "Completed jobs" graph to ramp up until roughly the same rate of 130-178 jobs/h is reached between 2019-08-08 2:00 and 2019-08-10 14:00. Unfortunately "Completed jobs" shows only 38-78 jobs/h. The timeframe of 2.5 d seems to be long enough to cover short term delays. Unlike BOINC tasks condor jobs are usually not buffered at the clients. So, why is the "Completed jobs" rate far below the "Running jobs" rate? Do I misunderstand the definitions, e.g. "running" or "completed"? |
Send message Joined: 29 Aug 05 Posts: 1061 Credit: 7,737,455 RAC: 298 |
I agree: The "Average CPU time" graph shows values between 1 and 2 h. I think you might, slightly. You should also be able to access the Grafana summary page which also gives definitions for the graphs (under the "i for information" icon at the top left). For running jobs it says, "Total number of running jobs in a given time bucket," while for completed jobs it's, "Number of jobs that reached completion in a given time bucket." So, if a job runs for two hours it will count as running in up to three [one-hour] time buckets, but it will only count as completed in one bucket. Basically the completed jobs is the number of jobs/hour[bucket] divided by the average time per job in hours[buckets]. You can see this more starkly if you change to the 12-minute binning. Then, the running jobs still stays around 150 (per time bin) but the number of jobs completed per time bin drops to around about 12. Conversely, if you change to one-day binning the running jobs stay the same (less for incomplete days) but the completed jobs go up to nearly 1,500/day. |
Send message Joined: 15 Jun 08 Posts: 2541 Credit: 254,608,838 RAC: 34,609 |
... if a job runs for two hours it will count as running in up to three [one-hour] time buckets. Yes. I think exactly this was my fault. I simply ignored the fact that a single job counts multiple times. Thanks for explaining. |
Send message Joined: 28 Sep 04 Posts: 732 Credit: 49,367,266 RAC: 17,281 |
I think that the CMS job graphs should be made available to all so you don't need to log in to Grafana (Cern). The SSO option for login does not work for Google or Windows Live, it just gives an error 'There was a problem accessing the site.', so I cannot see any of the graphs. |
©2024 CERN