[ClusterLabs] ClusterIP won't return to recovered node

Wed May 24 19:27:44 CEST 2017

I suspect this has been asked before and apologize if so, a google 
search didn't seem to find anything that was helpful to me ...

I'm setting up an active/active two-node cluster and am having an issue 
where one of my two defined clusterIPs will not return to the other node 
after it (the other node) has been recovered.

I'm running on CentOS 7.3. My resource setups look like this:

# cibadmin -Q|grep dc-version
         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" 
value="1.1.15-11.el7_3.4-e174ec8"/>

# pcs resource show PublicIP-clone
  Clone: PublicIP-clone
   Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true 
interleave=true
   Resource: PublicIP (class=ocf provider=heartbeat type=IPaddr2)
    Attributes: ip=75.144.71.38 cidr_netmask=24 nic=bond0
    Meta Attrs: resource-stickiness=0
    Operations: start interval=0s timeout=20s (PublicIP-start-interval-0s)
                stop interval=0s timeout=20s (PublicIP-stop-interval-0s)
                monitor interval=30s (PublicIP-monitor-interval-30s)

# pcs resource show PrivateIP-clone
  Clone: PrivateIP-clone
   Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true 
interleave=true
   Resource: PrivateIP (class=ocf provider=heartbeat type=IPaddr2)
    Attributes: ip=192.168.1.3 nic=bond1 cidr_netmask=24
    Meta Attrs: resource-stickiness=0
    Operations: start interval=0s timeout=20s (PrivateIP-start-interval-0s)
                stop interval=0s timeout=20s (PrivateIP-stop-interval-0s)
                monitor interval=10s timeout=20s 
(PrivateIP-monitor-interval-10s)

# pcs constraint --full | grep -i publicip
   start WEB-clone then start PublicIP-clone (kind:Mandatory) 
(id:order-WEB-clone-PublicIP-clone-mandatory)
# pcs constraint --full | grep -i privateip
   start WEB-clone then start PrivateIP-clone (kind:Mandatory) 
(id:order-WEB-clone-PrivateIP-clone-mandatory)

When I first create the resources, they split across the two nodes as 
expected/desired:

  Clone Set: PublicIP-clone [PublicIP] (unique)
      PublicIP:0        (ocf::heartbeat:IPaddr2):       Started node1-pcs
      PublicIP:1        (ocf::heartbeat:IPaddr2):       Started node2-pcs
  Clone Set: PrivateIP-clone [PrivateIP] (unique)
      PrivateIP:0        (ocf::heartbeat:IPaddr2):       Started node1-pcs
      PrivateIP:1        (ocf::heartbeat:IPaddr2):       Started node2-pcs
  Clone Set: WEB-clone [WEB]
      Started: [ node1-pcs node2-pcs ]

I then put the second node in standby:

# pcs node standby node2-pcs

And the IPs both jump to node1 as expected:

  Clone Set: PublicIP-clone [PublicIP] (unique)
      PublicIP:0        (ocf::heartbeat:IPaddr2):       Started node1-pcs
      PublicIP:1        (ocf::heartbeat:IPaddr2):       Started node1-pcs
  Clone Set: WEB-clone [WEB]
      Started: [ node1-pcs ]
      Stopped: [ node2-pcs ]
  Clone Set: PrivateIP-clone [PrivateIP] (unique)
      PrivateIP:0        (ocf::heartbeat:IPaddr2):       Started node1-pcs
      PrivateIP:1        (ocf::heartbeat:IPaddr2):       Started node1-pcs

Then unstandby the second node:

# pcs node unstandby node2-pcs

The publicIP goes back, but the private does not:

  Clone Set: PublicIP-clone [PublicIP] (unique)
      PublicIP:0        (ocf::heartbeat:IPaddr2):       Started node1-pcs
      PublicIP:1        (ocf::heartbeat:IPaddr2):       Started node2-pcs
  Clone Set: WEB-clone [WEB]
      Started: [ node1-pcs node2-pcs ]
  Clone Set: PrivateIP-clone [PrivateIP] (unique)
      PrivateIP:0        (ocf::heartbeat:IPaddr2):       Started node1-pcs
      PrivateIP:1        (ocf::heartbeat:IPaddr2):       Started node1-pcs

Anybody see what I'm doing wrong? I'm not seeing anything in the logs to 
indicate that it tries node2 and then fails; but I'm fairly new to the 
software so it's possible I'm not looking in the right place.

Also, I noticed when putting a node in standby the main NIC appears to 
be interrupted momentarily (long enough for my SSH session, which is 
connected via the permanent IP on the NIC and not the clusterIP, to be 
dropped). Is there any way to avoid this? I was thinking that the 
cluster operations would only affect the ClusteIP and not the other IPs 
being served on that NIC.

Thanks!

Dan