[ClusterLabs] service flap as nodes join and leave
Ken Gaillot
kgaillot at redhat.com
Wed Apr 13 16:36:14 UTC 2016
On 04/13/2016 11:23 AM, Christopher Harvey wrote:
> I have a 3 node cluster (see the bottom of this email for 'pcs config'
> output) with 3 nodes. The MsgBB-Active and AD-Active service both flap
> whenever a node joins or leaves the cluster. I trigger the leave and
> join with a pacemaker service start and stop on any node.
That's the default behavior of clones used in ordering constraints. If
you set interleave=true on your clones, each dependent clone instance
will only care about the depended-on instances on its own node, rather
than all nodes.
See
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_clone_options
While the interleave=true behavior is much more commonly used,
interleave=false is the default because it's safer -- the cluster
doesn't know anything about the cloned service, so it can't assume the
service is OK with it. Since you know what your service does, you can
set interleave=true for services that can handle it.
> Here is the happy steady state setup:
>
> 3 nodes and 4 resources configured
>
> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>
> Clone Set: Router-clone [Router]
> Started: [ vmr-132-3 vmr-132-4 ]
> MsgBB-Active (ocf::solace:MsgBB-Active): Started vmr-132-3
> AD-Active (ocf::solace:AD-Active): Started vmr-132-3
>
> [root at vmr-132-4 ~]# supervisorctl stop pacemaker
> no change, except vmr-132-4 goes offline
> [root at vmr-132-4 ~]# supervisorctl start pacemaker
> vmr-132-4 comes back online
> MsgBB-Active and AD-Active flap very quickly (<1s)
> Steady state is resumed.
>
> Why should the fact that vmr-132-4 coming and going affect the service
> on any other node?
>
> Thanks,
> Chris
>
> Cluster Name:
> Corosync Nodes:
> 192.168.132.5 192.168.132.4 192.168.132.3
> Pacemaker Nodes:
> vmr-132-3 vmr-132-4 vmr-132-5
>
> Resources:
> Clone: Router-clone
> Meta Attrs: clone-max=2 clone-node-max=1
> Resource: Router (class=ocf provider=solace type=Router)
> Meta Attrs: migration-threshold=1 failure-timeout=1s
> Operations: start interval=0s timeout=2 (Router-start-timeout-2)
> stop interval=0s timeout=2 (Router-stop-timeout-2)
> monitor interval=1s (Router-monitor-interval-1s)
> Resource: MsgBB-Active (class=ocf provider=solace type=MsgBB-Active)
> Meta Attrs: migration-threshold=2 failure-timeout=1s
> Operations: start interval=0s timeout=2 (MsgBB-Active-start-timeout-2)
> stop interval=0s timeout=2 (MsgBB-Active-stop-timeout-2)
> monitor interval=1s (MsgBB-Active-monitor-interval-1s)
> Resource: AD-Active (class=ocf provider=solace type=AD-Active)
> Meta Attrs: migration-threshold=2 failure-timeout=1s
> Operations: start interval=0s timeout=2 (AD-Active-start-timeout-2)
> stop interval=0s timeout=2 (AD-Active-stop-timeout-2)
> monitor interval=1s (AD-Active-monitor-interval-1s)
>
> Stonith Devices:
> Fencing Levels:
>
> Location Constraints:
> Resource: AD-Active
> Disabled on: vmr-132-5 (score:-INFINITY) (id:ADNotOnMonitor)
> Resource: MsgBB-Active
> Enabled on: vmr-132-4 (score:100) (id:vmr-132-4Priority)
> Enabled on: vmr-132-3 (score:250) (id:vmr-132-3Priority)
> Disabled on: vmr-132-5 (score:-INFINITY) (id:MsgBBNotOnMonitor)
> Resource: Router-clone
> Disabled on: vmr-132-5 (score:-INFINITY) (id:RouterNotOnMonitor)
> Ordering Constraints:
> Resource Sets:
> set Router-clone MsgBB-Active sequential=true
> (id:pcs_rsc_set_Router-clone_MsgBB-Active) setoptions kind=Mandatory
> (id:pcs_rsc_order_Router-clone_MsgBB-Active)
> set MsgBB-Active AD-Active sequential=true
> (id:pcs_rsc_set_MsgBB-Active_AD-Active) setoptions kind=Mandatory
> (id:pcs_rsc_order_MsgBB-Active_AD-Active)
> Colocation Constraints:
> MsgBB-Active with Router-clone (score:INFINITY)
> (id:colocation-MsgBB-Active-Router-clone-INFINITY)
> AD-Active with MsgBB-Active (score:1000)
> (id:colocation-AD-Active-MsgBB-Active-1000)
>
> Resources Defaults:
> No defaults set
> Operations Defaults:
> No defaults set
>
> Cluster Properties:
> cluster-infrastructure: corosync
> cluster-recheck-interval: 1s
> dc-version: 1.1.13-10.el7_2.2-44eb2dd
> have-watchdog: false
> maintenance-mode: false
> start-failure-is-fatal: false
> stonith-enabled: false
More information about the Users
mailing list