[ClusterLabs] Cloned IP not moving back after node restart or standby

Wed Jun 7 16:52:11 UTC 2017

On 05/30/2017 11:47 AM, Przemyslaw Kulczycki wrote:
> Hi.
> I'm trying to setup a 2-node corosync+pacemaker cluster to function as
> an active-active setup for nginx with a shared IP.
> 
> I've discovered (much to my disappointment) that every time I restart
> one node or put it in standby, the second instance of the cloned IP gets
> moved to the first node and doesn't go back once the second node is
> available, even though I have set stickiness to 0.
> 
> [upr at webdemo3 ~]$ sudo pcs status
> Cluster name: webdemo_cluster2
> Stack: corosync
> Current DC: webdemo3 (version 1.1.15-11.el7_3.4-e174ec8) - partition
> with quorum
> Last updated: Tue May 30 18:40:18 2017          Last change: Tue May 30
> 17:56:24 2017 by hacluster via crmd on webdemo4
> 
> 2 nodes and 4 resources configured
> 
> Online: [ webdemo3 webdemo4 ]
> 
> Full list of resources:
> 
>  Clone Set: ha-ip-clone [ha-ip] (unique)
>      ha-ip:0    (ocf::heartbeat:IPaddr2):       Started webdemo3
>      ha-ip:1    (ocf::heartbeat:IPaddr2):       Started webdemo3
>  Clone Set: ha-nginx-clone [ha-nginx] (unique)
>      ha-nginx:0 (ocf::heartbeat:nginx): Started webdemo3
>      ha-nginx:1 (ocf::heartbeat:nginx): Started webdemo4
> 
> Failed Actions:
> * ha-nginx:0_monitor_20000 on webdemo3 'not running' (7): call=108,
> status=complete, exitreason='none',
>     last-rc-change='Tue May 30 17:56:46 2017', queued=0ms, exec=0ms
> 
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> [upr at webdemo3 ~]$ sudo pcs config --full
> Cluster Name: webdemo_cluster2
> Corosync Nodes:
>  webdemo3 webdemo4
> Pacemaker Nodes:
>  webdemo3 webdemo4
> 
> Resources:
>  Clone: ha-ip-clone
>   Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true
> *stickiness=0*

It's resource-stickiness, not stickiness. So, this is ignored, and the
resource-stickiness of 100 set in resource defaults is what is used.

>   Resource: ha-ip (class=ocf provider=heartbeat type=IPaddr2)
>    Attributes: ip=10.75.39.235 cidr_netmask=24 clusterip_hash=sourceip
>    Operations: start interval=0s timeout=20s (ha-ip-start-interval-0s)
>                stop interval=0s timeout=20s (ha-ip-stop-interval-0s)
>                monitor interval=10s timeout=20s (ha-ip-monitor-interval-10s)
>  Clone: ha-nginx-clone
>   Meta Attrs: globally-unique=true clone-node-max=1
>   Resource: ha-nginx (class=ocf provider=heartbeat type=nginx)
>    Operations: start interval=0s timeout=60s (ha-nginx-start-interval-0s)
>                stop interval=0s timeout=60s (ha-nginx-stop-interval-0s)
>                monitor interval=20s timeout=30s
> (ha-nginx-monitor-interval-20s)
> 
> Stonith Devices:
> Fencing Levels:
> 
> Location Constraints:
> Ordering Constraints:
> Colocation Constraints:
>   ha-ip-clone with ha-nginx-clone (score:INFINITY)
> (id:colocation-ha-ip-ha-nginx-INFINITY)
> Ticket Constraints:
> 
> Alerts:
>  No alerts defined
> 
> Resources Defaults:
>  resource-stickiness: 100
> Operations Defaults:
>  No defaults set
> 
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-name: webdemo_cluster2
>  dc-version: 1.1.15-11.el7_3.4-e174ec8
>  have-watchdog: false
>  last-lrm-refresh: 1496159785
>  no-quorum-policy: ignore
>  stonith-enabled: false
> 
> Quorum:
>   Options:
> 
> Am I doing something incorrectly?
> 
> Additionally, I'd like to know what's the difference between these commands:
> 
> sudo pcs resource update ha-ip-clone stickiness=0
> 
> sudo pcs resource meta ha-ip-clone resource-stickiness=0
> 
> 
> They seem to set the same thing, but there might be a subtle difference.

The second one is the correct one -- resource-stickiness is a
meta-attribute (i.e. used directly by the cluster, not the resource
agent). The first one would set a resource agent parameter "stickiness".
I'd expect pcs to give an error since the agent doesn't support such a
parameter. Meta-attributes are free-form, so you can tag a resource with
your own site-specific information if desired, therefore there won't be
any such complaints about unsupported meta-attributes.

> 
> -- 
> Best Regards
>  
> Przemysław Kulczycki
> System administrator
> Avaleo
> 
> Email: upr at avaleo.net <mailto:upr at avaleo.net>