[ClusterLabs] Sticky resource not sticky after unplugging network cable

Fri Jul 1 07:17:41 UTC 2016

On 01/07/16 03:13 AM, Auer, Jens wrote:
> Hi,
> 
> I have an active/passive cluster configuration and I am trying to make a
> virtual IP resource sticky such that it does not move back to a node
> after a fail-over. In my setup, I have a location preference for the
> virtual IP to the "primary" node:
> pcs resource show --full
>  Resource: mda-ip (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=192.168.120.20 cidr_netmask=32 nic=bond0
>   Meta Attrs: stickiniess=201
>   Operations: start interval=0s timeout=20s (mda-ip-start-interval-0s)
>               stop interval=0s timeout=20s (mda-ip-stop-interval-0s)
>               monitor interval=30s (mda-ip-monitor-interval-30s)
>  Master: drbd1_sync
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1 notify=true
>   Resource: drbd1 (class=ocf provider=linbit type=drbd)
>    Attributes: drbd_resource=shared_fs
>    Operations: start interval=0s timeout=240 (drbd1-start-interval-0s)
>                promote interval=0s timeout=90 (drbd1-promote-interval-0s)
>                demote interval=0s timeout=90 (drbd1-demote-interval-0s)
>                stop interval=0s timeout=100 (drbd1-stop-interval-0s)
>                monitor interval=60s (drbd1-monitor-interval-60s)
>  Resource: shared_fs (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: device=/dev/drbd1 directory=/shared_fs fstype=xfs
>   Operations: start interval=0s timeout=60 (shared_fs-start-interval-0s)
>               stop interval=0s timeout=60 (shared_fs-stop-interval-0s)
>               monitor interval=20 timeout=40 (shared_fs-monitor-interval-20)
>  Resource: PF-PEP (class=ocf provider=pfpep type=pfpep_clusterSwitch)
>   Operations: start interval=0s timeout=20 (PF-PEP-start-interval-0s)
>               stop interval=0s timeout=20 (PF-PEP-stop-interval-0s)
>               monitor interval=10 timeout=20 (PF-PEP-monitor-interval-10)
>  Clone: supervisor-clone
>   Resource: supervisor (class=ocf provider=pfpep type=pfpep_supervisor)
>    Operations: start interval=0s timeout=20 (supervisor-start-interval-0s)
>                stop interval=0s timeout=20 (supervisor-stop-interval-0s)
>                monitor interval=10 timeout=20
> (supervisor-monitor-interval-10)
>  Clone: snmpAgent-clone
>   Resource: snmpAgent (class=ocf provider=pfpep type=pfpep_snmpAgent)
>    Operations: start interval=0s timeout=20 (snmpAgent-start-interval-0s)
>                stop interval=0s timeout=20 (snmpAgent-stop-interval-0s)
>                monitor interval=10 timeout=20
> (snmpAgent-monitor-interval-10)
> 
> Location Constraints:
>   Resource: mda-ip
>     Enabled on: MDA1PFP (score:50) (id:location-mda-ip-MDA1PFP-50)
> Ordering Constraints:
>   promote drbd1_sync then start shared_fs (kind:Mandatory)
> (id:order-drbd1_sync-shared_fs-mandatory)
>   start shared_fs then start PF-PEP (kind:Mandatory)
> (id:order-shared_fs-PF-PEP-mandatory)
>   start snmpAgent-clone then start supervisor-clone (kind:Optional)
> (id:order-snmpAgent-clone-supervisor-clone-Optional)
>   start shared_fs then start snmpAgent-clone (kind:Optional)
> (id:order-shared_fs-snmpAgent-clone-Optional)
> Colocation Constraints:
>   mda-ip with drbd1_sync (score:INFINITY) (with-rsc-role:Master)
> (id:colocation-mda-ip-drbd1_sync-INFINITY)
>   shared_fs with drbd1_sync (score:INFINITY) (with-rsc-role:Master)
> (id:colocation-shared_fs-drbd1_sync-INFINITY)
>   PF-PEP with mda-ip (score:INFINITY) (id:colocation-PF-PEP-mda-ip-INFINITY)
> 
> pcs resource defaults
> resource-stickiness: 100
> 
> I use the virtual IP as a master resource and colocate everyhting else
> with it. The resource prefers one node with a score of 50, and the
> stickiness is 100 so I expect that after switching to the passive node
> and activating the primary node again the resource stays on the passive
> node. This works fine if I manually stop the primary node with pcs
> cluster stop. However, when I try to force a fail-over by unplugging the
> network cables of the primary node, and then after waiting  plug in the
> cables again, the resource moves back to the primary node.
> 
> I tried larger stickiness values, and also to set a meta
> resource-stickiness property on the resource itself, but it did not
> change. How do configure this?
> 
> Best wishes,
>   Jens

Is stonith configured and working in pacemaker, and did you configure
DRBD to use 'fencing resource-and-stonith;' and setup the
crm-{un,}fence-peer.sh fence handlers?

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?