[ClusterLabs] cron-suitable cluster status check

Sat Feb 27 16:56:51 EST 2016

Right now in a test cluster on CentOS 7 I'm occasionally seeing
resource monitoring failures and, just today, a failure to start
a fencing agent.  While I need to track those down problems, the
issue I want to discuss here is being notified when there is a
problem with the cluster, where there is not a nagios-type monitoring
system in place.

On an older CentOS 5 cluster I have a cron job that periodically runs
'crm_verify -LV'.  If the return code is non-zero, the output of
that command (and some other info) is mailed to the operator.  That
mechanism has been working well for years.

However on CentOS 7, when the cluster gets into this state 'crm_verify -LV'
returns zero, and its output claims there is no problem.  However in
'crm_mon -f' I can see that I've got resource failures and nonzero
failcounts.

I tried 'pcs cluster status', however when the cluster is properly
working (no failures), that command still has a return code of '1',
probably because I get the 'Error: no nodes found in corosync.conf'
which is an ignorable condition per
<https://access.redhat.com/solutions/663283>.

Is there a command that I can run from cron in the current cluster
tools to tell me the simple answer of whether there is *anything*
failed in the cluster, preferably based on its return code?

The CentOS 7 cluster is running:
    corosync 2.3.4
    pacemaker 1.1.13

The CentOS 5 cluster is running:
    corosync 1.2.7
    pacemaker 1.0.12

The corosync.conf is included below:

--------- cut here and be careful of pointy scissors ---------
totem {
	version: 2
	#secauth: off
	cluster_name: somecluster
	#transport: udpu
	rrp_mode: passive
	crypto_hash: sha256
	clear_node_high_bit: yes

	interface {
		ringnumber: 0
		bindnetaddr: 192.168.1.0
		mcastaddr: 239.192.0.5
		mcastport: 5406
	}
	interface {
		ringnumber: 1
		bindnetaddr: 192.168.2.0
		mcastaddr: 239.192.0.6
		mcastport: 5408
	}
}

quorum {
	provider: corosync_votequorum
	two_node: 1
	expected_votes: 2
}

logging {
	to_syslog: yes
}

--------- cut here and be careful of pointy scissors ---------

Devin