Message boards : CMS Application : Grafana Errors
Message board moderation

To post messages, you must log in.

AuthorMessage
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2685
Credit: 286,926,427
RAC: 57,109
Message 40077 - Posted: 4 Oct 2019, 5:48:35 UTC

This morning 02:00 UTC the CMS Grafana job monitoring dropped from 140 running tasks to 0 running tasks.
I suspect Grafana doesn't get the correct numbers since my computers are crunching as usual.
ID: 40077 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,453,285
RAC: 9,028
Message 40078 - Posted: 4 Oct 2019, 8:41:44 UTC - in response to Message 40077.  

This morning 02:00 UTC the CMS Grafana job monitoring dropped from 140 running tasks to 0 running tasks.
I suspect Grafana doesn't get the correct numbers since my computers are crunching as usual.

Yes, there appears to have been a glitch in the monitoring -- it affected all of CMS. If you hit the select "Site" filtering and choose "All" you can see the monitor for all of CMS. Something like this...
ID: 40078 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2685
Credit: 286,926,427
RAC: 57,109
Message 43230 - Posted: 18 Aug 2020, 8:37:26 UTC

Grafana shows 0 running jobs since this morning.
Some backend services may need help.
ID: 43230 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,453,285
RAC: 9,028
Message 43240 - Posted: 20 Aug 2020, 18:05:31 UTC - in response to Message 43230.  

There was a site-wide monitoring problem on Tuesday that seemed to affect Grafana. Yesterday's problems were something different that seems to be fixed now.
ID: 43240 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,453,285
RAC: 9,028
Message 43294 - Posted: 28 Aug 2020, 11:05:36 UTC

We now have a new set of monit/grafana job graphs, because CMS has updated their monitoring. They still show the same things, but they are now available on this LHC@Home site as well as at LHC@Home-dev.
ID: 43294 · Report as offensive     Reply Quote
computezrmle
Volunteer moderator
Volunteer developer
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 15 Jun 08
Posts: 2685
Credit: 286,926,427
RAC: 57,109
Message 43791 - Posted: 5 Dec 2020, 8:57:39 UTC

ID: 43791 · Report as offensive     Reply Quote
ivan
Volunteer moderator
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar

Send message
Joined: 29 Aug 05
Posts: 1110
Credit: 9,453,285
RAC: 9,028
Message 43802 - Posted: 7 Dec 2020, 15:55:47 UTC - in response to Message 43791.  

It looks like that was temporary. :-( We're still trying to prevent our rogue volunteer from running bad jobs. Unfortunately my latest message from Laurence got caught up in our spam trap and delayed by a week.
ID: 43802 · Report as offensive     Reply Quote

Message boards : CMS Application : Grafana Errors


©2025 CERN