[ClusterLabs] Grafana with ClusterLabs HA dashboard
Pinkesh Valdria
pinkesh.valdria at oracle.com
Sun Feb 7 09:34:21 EST 2021
This is my first attempt to use Grafana with ClusterLabs HA dashboard. I got Grafana, Prometheus and Prometheus node_exporter to work and I am able to see those metrics. Next step was to make “ClusterLabs HA Cluster details” Grafana dashboard to work and I am unable to make it work. Appreciate if you can point me in the right direction.
https://grafana.com/grafana/dashboards/12229
I am running Grafana on default 3000 port. Similarly using defaults for Prometheus. Prometheus node_exporter is using 9100.
I installed “ha_cluster_exporter” on all NFS-HA nodes (node1, node2 and corosync quorum (qdevice) node. I see they use 9664 port.
nfs-server-1
[root at nfs-server-1 ha_cluster_exporter]# systemctl status ha_cluster_exporter
● ha_cluster_exporter.service - Prometheus exporter for Pacemaker HA clusters metrics
Loaded: loaded (/usr/lib/systemd/system/ha_cluster_exporter.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2021-02-07 11:39:07 GMT; 1h 3min ago
Main PID: 18547 (ha_cluster_expo)
Memory: 6.7M
CGroup: /system.slice/ha_cluster_exporter.service
└─18547 /root/go/bin/ha_cluster_exporter
Feb 07 11:39:07 nfs-server-1 systemd[1]: Started Prometheus exporter for Pacemaker HA clusters metrics.
Feb 07 11:39:07 nfs-server-1 ha_cluster_exporter[18547]: time="2021-02-07T11:39:07Z" level=warning msg="Config File \"ha_cluster_exporter\" Not Found in \"[/ /.config /etc /usr/etc]\""
Feb 07 11:39:07 nfs-server-1 ha_cluster_exporter[18547]: time="2021-02-07T11:39:07Z" level=info msg="Default config values will be used"
Feb 07 11:39:07 nfs-server-1 ha_cluster_exporter[18547]: time="2021-02-07T11:39:07Z" level=warning msg="Registration failure: could not initialize 'drbd' collector: '/sbin/drbdsetup' does not exist"
Feb 07 11:39:07 nfs-server-1 ha_cluster_exporter[18547]: time="2021-02-07T11:39:07Z" level=info msg="'pacemaker' collector registered."
Feb 07 11:39:07 nfs-server-1 ha_cluster_exporter[18547]: time="2021-02-07T11:39:07Z" level=info msg="'corosync' collector registered."
Feb 07 11:39:07 nfs-server-1 ha_cluster_exporter[18547]: time="2021-02-07T11:39:07Z" level=info msg="'sbd' collector registered."
Feb 07 11:39:07 nfs-server-1 ha_cluster_exporter[18547]: time="2021-02-07T11:39:07Z" level=info msg="Serving metrics on 0.0.0.0:9664"
[root at nfs-server-1 ha_cluster_exporter]#
Similarly on
nfs-server-2 and qdevice node.
[root at nfs-server-2 ha_cluster_exporter]# systemctl status ha_cluster_exporter
…….
…..
Feb 07 12:15:10 nfs-server-2 ha_cluster_exporter[11895]: time="2021-02-07T12:15:10Z" level=warning msg="Registration failure: could not initialize 'drbd' collector: '/sbin/drbdsetup' does not exist"
Feb 07 12:15:10 nfs-server-2 ha_cluster_exporter[11895]: time="2021-02-07T12:15:10Z" level=info msg="'pacemaker' collector registered."
Feb 07 12:15:10 nfs-server-2 ha_cluster_exporter[11895]: time="2021-02-07T12:15:10Z" level=info msg="'corosync' collector registered."
Feb 07 12:15:10 nfs-server-2 ha_cluster_exporter[11895]: time="2021-02-07T12:15:10Z" level=info msg="'sbd' collector registered."
Feb 07 12:15:10 nfs-server-2 ha_cluster_exporter[11895]: time="2021-02-07T12:15:10Z" level=info msg="Serving metrics on 0.0.0.0:9664"
I copied this file https://github.com/ClusterLabs/ha_cluster_exporter/blob/master/dashboards/provider-sleha.yaml to /etc/grafana/provisioning/dashboards/ and copied ha-cluster-details_rev2.json<https://github.com/ClusterLabs/ha_cluster_exporter/blob/master/dashboards/provider-sleha.yaml%20to%20/etc/grafana/provisioning/dashboards/%20and%20copied%20ha-cluster-details_rev2.json> file to /etc/grafana/dashboards/sleha as mentioned in manual steps section of this page: https://github.com/ClusterLabs/ha_cluster_exporter/tree/master/dashboards
I see these errors in Grafana log and the Grafana UI says “No data”.
t=2021-02-07T12:51:18+0000 lvl=eror msg="Data proxy error" logger=data-proxy-log userId=1 orgId=1 uname=admin path=/api/datasources/proxy/1/api/v1/query_range remote_addr=10.0.0.2 referer="http://localhost:3000/d/Q5YJpwtZk1/clusterlabs-ha-cluster-details?orgId=1" error="http: proxy error: context canceled"
t=2021-02-07T12:51:18+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/1/api/v1/query_range status=502 remote_addr=10.0.0.2 time_ms=13 size=0 referer="http://localhost:3000/d/Q5YJpwtZk1/clusterlabs-ha-cluster-details?orgId=1"
t=2021-02-07T12:51:48+0000 lvl=eror msg="Data proxy error" logger=data-proxy-log userId=1 orgId=1 uname=admin path=/api/datasources/proxy/1/api/v1/query remote_addr=10.0.0.2 referer="http://localhost:3000/d/Q5YJpwtZk1/clusterlabs-ha-cluster-details?orgId=1&var-DS_PROMETHEUS=Prometheus&var-cluster=nfs-ha&var-dc_instance=" error="http: proxy error: context canceled"
t=2021-02-07T12:51:48+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/datasources/proxy/1/api/v1/query status=502 remote_addr=10.0.0.2 time_ms=4 size=0 referer=http://localhost:3000/d/Q5YJpwtZk1/clusterlabs-ha-cluster-details?orgId=1&var-DS_PROMETHEUS=Prometheus&var-cluster=nfs-ha&var-dc_instance=
[A screenshot of a computer Description automatically generated with medium confidence]
This is my monitoring node running Grafana:
[root at client-1 log]# ls -l /etc/grafana/dashboards
total 92
-rw-r--r--. 1 root root 94021 Feb 7 10:15 node-exporter.json
drwxr-xr-x. 2 root root 42 Feb 7 12:58 sleha
[root at client-1 log]# ls -l /etc/grafana/dashboards/sleha/
total 44
-rw-r--r--. 1 root root 41665 Feb 7 12:27 ha-cluster-details_rev2.json
[root at client-1 log]#
cat /etc/grafana/provisioning/dashboards/provider-sleha.yaml
apiVersion: 1
providers:
- name: SUSE Linux Enterprise High Availability Extension
folder: SUSE Linux Enterprise
folderUid: 3b1e0b26-fc28-4254-88a1-2d3516b5e404
type: file
allowUiUpdates: true
editable: true
options:
path: /etc/grafana/dashboards/sleha
Copied file (ha-cluster-details_rev2.json) from here: https://grafana.com/grafana/dashboards/12229/revisions to /etc/grafana/dashboards/sleha
cat /etc/grafana/provisioning/dashboards/node_exporter.yaml
apiVersion: 1
providers:
- name: 'NFS HA Dashboard'
type: file
updateIntervalSeconds: 10
options:
path: /etc/grafana/dashboards/node-exporter.json
[root at client-1 log]#
I added the job_name: 'nfs-ha' section to /etc/prometheus/prometheus.yml , since the ClusterLabs HA Grafan dashboard says : “It is built on top of ha_cluster_exporter, but it also requires Prometheus node_exporter to be configured on the target nodes, and it also assumes that the target nodes in each cluster are grouped via the job label.”
https://grafana.com/grafana/dashboards/12229
- job_name: 'quorum'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['qdevice.storage.nfs.oraclevcn.com:9100']
labels:
group: 'quorum'
- job_name: 'nfs_server'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['nfs-server-1.storage.nfs.oraclevcn.com:9100', 'nfs-server-2.storage.nfs.oraclevcn.com:9100']
labels:
group: 'nfs_server'
- job_name: 'nfs-ha'
scrape_interval: 5s
static_configs:
- targets: ['nfs-server-1.storage.nfs.oraclevcn.com:9100', 'nfs-server-2.storage.nfs.oraclevcn.com:9100', 'qdevice.storage.nfs.oraclevcn.com:9100']
labels:
group: 'nfs-ha'
Thanks,
Pinkesh Valdria
Principal Solutions Architect – HPC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210207/ab8aa467/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 133405 bytes
Desc: image001.png
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210207/ab8aa467/attachment-0001.png>
More information about the Users
mailing list