[ClusterLabs] service flap as nodes join and leave
Christopher Harvey
cwh at eml.cc
Wed Apr 13 16:23:44 UTC 2016
I have a 3 node cluster (see the bottom of this email for 'pcs config'
output) with 3 nodes. The MsgBB-Active and AD-Active service both flap
whenever a node joins or leaves the cluster. I trigger the leave and
join with a pacemaker service start and stop on any node.
Here is the happy steady state setup:
3 nodes and 4 resources configured
Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
Clone Set: Router-clone [Router]
Started: [ vmr-132-3 vmr-132-4 ]
MsgBB-Active (ocf::solace:MsgBB-Active): Started vmr-132-3
AD-Active (ocf::solace:AD-Active): Started vmr-132-3
[root at vmr-132-4 ~]# supervisorctl stop pacemaker
no change, except vmr-132-4 goes offline
[root at vmr-132-4 ~]# supervisorctl start pacemaker
vmr-132-4 comes back online
MsgBB-Active and AD-Active flap very quickly (<1s)
Steady state is resumed.
Why should the fact that vmr-132-4 coming and going affect the service
on any other node?
Thanks,
Chris
Cluster Name:
Corosync Nodes:
192.168.132.5 192.168.132.4 192.168.132.3
Pacemaker Nodes:
vmr-132-3 vmr-132-4 vmr-132-5
Resources:
Clone: Router-clone
Meta Attrs: clone-max=2 clone-node-max=1
Resource: Router (class=ocf provider=solace type=Router)
Meta Attrs: migration-threshold=1 failure-timeout=1s
Operations: start interval=0s timeout=2 (Router-start-timeout-2)
stop interval=0s timeout=2 (Router-stop-timeout-2)
monitor interval=1s (Router-monitor-interval-1s)
Resource: MsgBB-Active (class=ocf provider=solace type=MsgBB-Active)
Meta Attrs: migration-threshold=2 failure-timeout=1s
Operations: start interval=0s timeout=2 (MsgBB-Active-start-timeout-2)
stop interval=0s timeout=2 (MsgBB-Active-stop-timeout-2)
monitor interval=1s (MsgBB-Active-monitor-interval-1s)
Resource: AD-Active (class=ocf provider=solace type=AD-Active)
Meta Attrs: migration-threshold=2 failure-timeout=1s
Operations: start interval=0s timeout=2 (AD-Active-start-timeout-2)
stop interval=0s timeout=2 (AD-Active-stop-timeout-2)
monitor interval=1s (AD-Active-monitor-interval-1s)
Stonith Devices:
Fencing Levels:
Location Constraints:
Resource: AD-Active
Disabled on: vmr-132-5 (score:-INFINITY) (id:ADNotOnMonitor)
Resource: MsgBB-Active
Enabled on: vmr-132-4 (score:100) (id:vmr-132-4Priority)
Enabled on: vmr-132-3 (score:250) (id:vmr-132-3Priority)
Disabled on: vmr-132-5 (score:-INFINITY) (id:MsgBBNotOnMonitor)
Resource: Router-clone
Disabled on: vmr-132-5 (score:-INFINITY) (id:RouterNotOnMonitor)
Ordering Constraints:
Resource Sets:
set Router-clone MsgBB-Active sequential=true
(id:pcs_rsc_set_Router-clone_MsgBB-Active) setoptions kind=Mandatory
(id:pcs_rsc_order_Router-clone_MsgBB-Active)
set MsgBB-Active AD-Active sequential=true
(id:pcs_rsc_set_MsgBB-Active_AD-Active) setoptions kind=Mandatory
(id:pcs_rsc_order_MsgBB-Active_AD-Active)
Colocation Constraints:
MsgBB-Active with Router-clone (score:INFINITY)
(id:colocation-MsgBB-Active-Router-clone-INFINITY)
AD-Active with MsgBB-Active (score:1000)
(id:colocation-AD-Active-MsgBB-Active-1000)
Resources Defaults:
No defaults set
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-recheck-interval: 1s
dc-version: 1.1.13-10.el7_2.2-44eb2dd
have-watchdog: false
maintenance-mode: false
start-failure-is-fatal: false
stonith-enabled: false
More information about the Users
mailing list