[ClusterLabs] Auto-failback prevention
Jaap Winius
jwinius at umrk.nl
Thu Feb 20 11:15:38 EST 2020
Hi folks,
My test system consists of two nodes running CentOS 7 and DRBD. The
Pacemaker configuration below results in a system with a preferred
master, bd3c7, and a slave, bd4c7, between which a set of resources
will fail over and later back, within about two seconds either way,
depending on the availability of the master. So far so good.
However, now I'd like to prevent those automatic failbacks from
happening. The two nodes are completely equal anyway and failbacks
just double the time that the service is unavailable. Besides, if
bd3c7 starts suffering from intermittent connectivity problems, any
failbacks could just make things worse.
I have tried setting the meta option "resource-stickiness" to 100, but
this seems to have no effect. I suspect it is due to the presence of
"cli-prefer-drbd-master", but without this location constraint (which
is created by the "pcs resource move" command) the resources just stop
and never fail over when the node goes down.
So, how should the configuration below be altered to result only in
failovers, while preventing failbacks from ever happening?
Thanks,
Jaap Winius
############# config ###################
[root at bd4c7 ~]# pcs resource create drbd ocf:linbit:drbd
drbd_resource=r0 op monitor interval=60s ; \
pcs resource master drbd master-max=1 master-node-max=1
clone-max=2 clone-node-max=1 notify=true ; \
pcs resource create mount Filesystem device="/dev/drbd0"
directory="/data" fstype="ext4" ; \
pcs constraint colocation add mount with drbd-master INFINITY
with-rsc-role=Master ; \
pcs constraint order promote drbd-master then mount ; \
pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.2.73
cidr_netmask=24 op monitor interval=30s ; \
pcs constraint colocation add vip with drbd-master INFINITY
with-rsc-role=Master ; \
pcs constraint order mount then vip ; \
pcs resource create nfsd nfsserver nfs_shared_infodir=/data ; \
pcs resource create nfscfg exportfs clientspec="192.168.2.55"
options=rw,no_subtree_check,no_root_squash directory=/data fsid=0 ; \
pcs constraint colocation add nfsd with vip ; \
pcs constraint colocation add nfscfg with nfsd ; \
pcs constraint order vip then nfsd ; \
pcs constraint order nfsd then nfscfg ; \
pcs resource move drbd-master --master bd3c7
[root at bd4c7 ~]# pcs config
Cluster Name: fscluster
Corosync Nodes:
bd3c7 bd4c7
Pacemaker Nodes:
bd3c7 bd4c7
Resources:
Master: drbd-master
Meta Attrs: clone-max=2 clone-node-max=1 master-max=1
master-node-max=1 notify=true
Resource: drbd (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=r0
Operations: demote interval=0s timeout=90 (drbd-demote-interval-0s)
monitor interval=60s (drbd-monitor-interval-60s)
notify interval=0s timeout=90 (drbd-notify-interval-0s)
promote interval=0s timeout=90 (drbd-promote-interval-0s)
reload interval=0s timeout=30 (drbd-reload-interval-0s)
start interval=0s timeout=240 (drbd-start-interval-0s)
stop interval=0s timeout=100 (drbd-stop-interval-0s)
Resource: mount (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd0 directory=/data fstype=ext4
Operations: monitor interval=20s timeout=40s (mount-monitor-interval-20s)
notify interval=0s timeout=60s (mount-notify-interval-0s)
start interval=0s timeout=60s (mount-start-interval-0s)
stop interval=0s timeout=60s (mount-stop-interval-0s)
Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: cidr_netmask=24 ip=192.168.2.73
Operations: monitor interval=30s (vip-monitor-interval-30s)
start interval=0s timeout=20s (vip-start-interval-0s)
stop interval=0s timeout=20s (vip-stop-interval-0s)
Resource: nfsd (class=ocf provider=heartbeat type=nfsserver)
Attributes: nfs_shared_infodir=/data
Operations: monitor interval=10s timeout=20s (nfsd-monitor-interval-10s)
start interval=0s timeout=40s (nfsd-start-interval-0s)
stop interval=0s timeout=20s (nfsd-stop-interval-0s)
Resource: nfscfg (class=ocf provider=heartbeat type=exportfs)
Attributes: clientspec=192.168.2.55 directory=/data fsid=0
options=rw,no_subtree_check,no_root_squash
Operations: monitor interval=10s timeout=20s (nfscfg-monitor-interval-10s)
start interval=0s timeout=40s (nfscfg-start-interval-0s)
stop interval=0s timeout=120s (nfscfg-stop-interval-0s)
Stonith Devices:
Fencing Levels:
Location Constraints:
Resource: drbd-master
Enabled on: bd3c7 (score:INFINITY) (role: Master)
(id:cli-prefer-drbd-master)
Ordering Constraints:
promote drbd-master then start mount (kind:Mandatory)
(id:order-drbd-master-mount-mandatory)
start mount then start vip (kind:Mandatory) (id:order-mount-vip-mandatory)
start vip then start nfsd (kind:Mandatory) (id:order-vip-nfsd-mandatory)
start nfsd then start nfscfg (kind:Mandatory)
(id:order-nfsd-nfscfg-mandatory)
Colocation Constraints:
mount with drbd-master (score:INFINITY) (with-rsc-role:Master)
(id:colocation-mount-drbd-master-INFINITY)
vip with drbd-master (score:INFINITY) (with-rsc-role:Master)
(id:colocation-vip-drbd-master-INFINITY)
nfsd with vip (score:INFINITY) (id:colocation-nfsd-vip-INFINITY)
nfscfg with nfsd (score:INFINITY) (id:colocation-nfscfg-nfsd-INFINITY)
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
No defaults set
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: fscluster
dc-version: 1.1.20-5.el7_7.2-3c4c782f70
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false
Quorum:
Options:
[root at bd4c7 ~]# pcs status
Cluster name: fscluster
Stack: corosync
Current DC: bd3c7 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition
with quorum
Last updated: Thu Feb 20 15:22:46 2020
Last change: Thu Feb 20 15:14:33 2020 by root via crm_resource on bd4c7
2 nodes configured
6 resources configured
Online: [ bd3c7 bd4c7 ]
Full list of resources:
Master/Slave Set: drbd-master [drbd]
Masters: [ bd3c7 ]
Slaves: [ bd4c7 ]
mount (ocf::heartbeat:Filesystem): Started bd3c7
vip (ocf::heartbeat:IPaddr2): Started bd3c7
nfsd (ocf::heartbeat:nfsserver): Started bd3c7
nfscfg (ocf::heartbeat:exportfs): Started bd3c7
Failed Resource Actions:
* mount_monitor_20000 on bd4c7 'unknown error' (1): call=27,
status=Error, exitreason='',
last-rc-change='Thu Feb 20 15:14:25 2020', queued=0ms, exec=0ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/disabled
[root at bd4c7 ~]#
########################################
More information about the Users
mailing list