[ClusterLabs] Cloned IP not moving back after node restart or standby

Tue May 30 12:47:54 EDT 2017

Hi.
I'm trying to setup a 2-node corosync+pacemaker cluster to function as an
active-active setup for nginx with a shared IP.

I've discovered (much to my disappointment) that every time I restart one
node or put it in standby, the second instance of the cloned IP gets moved
to the first node and doesn't go back once the second node is available,
even though I have set stickiness to 0.

[upr at webdemo3 ~]$ sudo pcs status
Cluster name: webdemo_cluster2
Stack: corosync
Current DC: webdemo3 (version 1.1.15-11.el7_3.4-e174ec8) - partition with
quorum
Last updated: Tue May 30 18:40:18 2017          Last change: Tue May 30
17:56:24 2017 by hacluster via crmd on webdemo4

2 nodes and 4 resources configured

Online: [ webdemo3 webdemo4 ]

Full list of resources:

 Clone Set: ha-ip-clone [ha-ip] (unique)
     ha-ip:0    (ocf::heartbeat:IPaddr2):       Started webdemo3
     ha-ip:1    (ocf::heartbeat:IPaddr2):       Started webdemo3
 Clone Set: ha-nginx-clone [ha-nginx] (unique)
     ha-nginx:0 (ocf::heartbeat:nginx): Started webdemo3
     ha-nginx:1 (ocf::heartbeat:nginx): Started webdemo4

Failed Actions:
* ha-nginx:0_monitor_20000 on webdemo3 'not running' (7): call=108,
status=complete, exitreason='none',
    last-rc-change='Tue May 30 17:56:46 2017', queued=0ms, exec=0ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

[upr at webdemo3 ~]$ sudo pcs config --full
Cluster Name: webdemo_cluster2
Corosync Nodes:
 webdemo3 webdemo4
Pacemaker Nodes:
 webdemo3 webdemo4

Resources:
 Clone: ha-ip-clone
  Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true
*stickiness=0*
  Resource: ha-ip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.75.39.235 cidr_netmask=24 clusterip_hash=sourceip
   Operations: start interval=0s timeout=20s (ha-ip-start-interval-0s)
               stop interval=0s timeout=20s (ha-ip-stop-interval-0s)
               monitor interval=10s timeout=20s (ha-ip-monitor-interval-10s)
 Clone: ha-nginx-clone
  Meta Attrs: globally-unique=true clone-node-max=1
  Resource: ha-nginx (class=ocf provider=heartbeat type=nginx)
   Operations: start interval=0s timeout=60s (ha-nginx-start-interval-0s)
               stop interval=0s timeout=60s (ha-nginx-stop-interval-0s)
               monitor interval=20s timeout=30s
(ha-nginx-monitor-interval-20s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:
  ha-ip-clone with ha-nginx-clone (score:INFINITY)
(id:colocation-ha-ip-ha-nginx-INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 resource-stickiness: 100
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: webdemo_cluster2
 dc-version: 1.1.15-11.el7_3.4-e174ec8
 have-watchdog: false
 last-lrm-refresh: 1496159785
 no-quorum-policy: ignore
 stonith-enabled: false

Quorum:
  Options:

Am I doing something incorrectly?

Additionally, I'd like to know what's the difference between these commands:

sudo pcs resource update ha-ip-clone stickiness=0

sudo pcs resource meta ha-ip-clone resource-stickiness=0

They seem to set the same thing, but there might be a subtle difference.

-- 
Best Regards

Przemysław Kulczycki
System administrator
Avaleo

Email: upr at avaleo.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20170530/298011d6/attachment-0002.html>