[ClusterLabs] pcsd communications redundancy

Tue Nov 24 18:14:05 EST 2015

Explanation first; config details follow.

I've used pacemaker+(corosync or openais) for years, but I'm just
deploying pacemaker with corosync2+pcs for the first time.  I'm starting
with a two-node cluster.  Each node has one public network interface
(10.0.1.0/24) and one heartbeat network interface (192.168.1.0/24),
the latter being a crossover cable between nodes, with redundant
rings configured.

I've got things up and running to the point where (other than having
no stonith configured yet), 'pcs status' is happy: both nodes online,
Quorum achieved, PCSD is online.  I've verified that the cluster
remains up when disconnecting either network connection, and can
see the totem packes on both interface.

It appears though, that the 'PCSD Status' is only online when the
ring0 NIC is active, though.  That is, if I disconnect the crossover
cable, then the 'PCSD Status' block shows the local node as online
and the remote node as offline.

Is this normal? Or should pcsd communication have a way to fall back
to the ring1 NIC?

Because I didn't use 'nodelist' in corosync.conf (see below), pcsd
seems to be using an IP only which would explain why communications
are bound to that NIC.  However is there a way to get it to use both
interfaces?  The IPs (192.168.1.0/24) used by the crossover NICs
aren't in the DNS and I'd prefer to keep it that way (either in
forward or reverse zones), so I'm assuming that if nodelist were
used then pcsd would communicate only over the 10.0.1.0/24 network,
which is no better than what I have now.

Setup and configuration information follows.

===========================================

'pcs cluster setup' was original invoked with node names (not FQDN).
The corosync.conf file was later modified to use multicast and
redundant ring.  'pcs cluster auth' was initially invoked with
node names, but later without arguments, thus authorizing by IP.

# cat /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)

# pcs status
Cluster name:
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Tue Nov 24 15:56:21 2015		Last change: Tue Nov 24 15:36:20 
2015 by hacluster via crmd on node2
Stack: corosync
Current DC: node1 (version 1.1.13-a14efad) - partition with quorum
2 nodes and 0 resources configured

Online: [ node1 node2 ]

Full list of resources:

PCSD Status:
  192.168.1.68: Online
  192.168.1.69: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

# grep node /etc/hosts
10.0.1.68 node1.example.tld node1
10.0.1.69 node2.example.tld node2
192.168.1.68 node1hb.example.tld node1hb
192.168.1.69 node2hb.example.tld node2hb

# cat /etc/corosync/corosync.conf
totem {
	version: 2
	cluster_name: testcluster
	rrp_mode: passive
	crypto_hash: sha256
	interface {
		ringnumber: 0
		bindnetaddr: 192.168.1.0
		mcastaddr: 239.192.0.5
		mcastport: 5406
	}
	interface {
		ringnumber: 1
		bindnetaddr: 10.0.1.0
		mcastaddr: 239.192.0.6
		mcastport: 5408
	}
}
quorum {
	provider: corosync_votequorum
	two_node: 1
	expected_votes: 2
}
logging {
	to_syslog: yes
}