[ClusterLabs] ClusterIP won't return to recovered node
Dan Ragle
daniel at Biblestuph.com
Wed May 24 19:27:44 CEST 2017
I suspect this has been asked before and apologize if so, a google
search didn't seem to find anything that was helpful to me ...
I'm setting up an active/active two-node cluster and am having an issue
where one of my two defined clusterIPs will not return to the other node
after it (the other node) has been recovered.
I'm running on CentOS 7.3. My resource setups look like this:
# cibadmin -Q|grep dc-version
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.15-11.el7_3.4-e174ec8"/>
# pcs resource show PublicIP-clone
Clone: PublicIP-clone
Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true
interleave=true
Resource: PublicIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=75.144.71.38 cidr_netmask=24 nic=bond0
Meta Attrs: resource-stickiness=0
Operations: start interval=0s timeout=20s (PublicIP-start-interval-0s)
stop interval=0s timeout=20s (PublicIP-stop-interval-0s)
monitor interval=30s (PublicIP-monitor-interval-30s)
# pcs resource show PrivateIP-clone
Clone: PrivateIP-clone
Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true
interleave=true
Resource: PrivateIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=192.168.1.3 nic=bond1 cidr_netmask=24
Meta Attrs: resource-stickiness=0
Operations: start interval=0s timeout=20s (PrivateIP-start-interval-0s)
stop interval=0s timeout=20s (PrivateIP-stop-interval-0s)
monitor interval=10s timeout=20s
(PrivateIP-monitor-interval-10s)
# pcs constraint --full | grep -i publicip
start WEB-clone then start PublicIP-clone (kind:Mandatory)
(id:order-WEB-clone-PublicIP-clone-mandatory)
# pcs constraint --full | grep -i privateip
start WEB-clone then start PrivateIP-clone (kind:Mandatory)
(id:order-WEB-clone-PrivateIP-clone-mandatory)
When I first create the resources, they split across the two nodes as
expected/desired:
Clone Set: PublicIP-clone [PublicIP] (unique)
PublicIP:0 (ocf::heartbeat:IPaddr2): Started node1-pcs
PublicIP:1 (ocf::heartbeat:IPaddr2): Started node2-pcs
Clone Set: PrivateIP-clone [PrivateIP] (unique)
PrivateIP:0 (ocf::heartbeat:IPaddr2): Started node1-pcs
PrivateIP:1 (ocf::heartbeat:IPaddr2): Started node2-pcs
Clone Set: WEB-clone [WEB]
Started: [ node1-pcs node2-pcs ]
I then put the second node in standby:
# pcs node standby node2-pcs
And the IPs both jump to node1 as expected:
Clone Set: PublicIP-clone [PublicIP] (unique)
PublicIP:0 (ocf::heartbeat:IPaddr2): Started node1-pcs
PublicIP:1 (ocf::heartbeat:IPaddr2): Started node1-pcs
Clone Set: WEB-clone [WEB]
Started: [ node1-pcs ]
Stopped: [ node2-pcs ]
Clone Set: PrivateIP-clone [PrivateIP] (unique)
PrivateIP:0 (ocf::heartbeat:IPaddr2): Started node1-pcs
PrivateIP:1 (ocf::heartbeat:IPaddr2): Started node1-pcs
Then unstandby the second node:
# pcs node unstandby node2-pcs
The publicIP goes back, but the private does not:
Clone Set: PublicIP-clone [PublicIP] (unique)
PublicIP:0 (ocf::heartbeat:IPaddr2): Started node1-pcs
PublicIP:1 (ocf::heartbeat:IPaddr2): Started node2-pcs
Clone Set: WEB-clone [WEB]
Started: [ node1-pcs node2-pcs ]
Clone Set: PrivateIP-clone [PrivateIP] (unique)
PrivateIP:0 (ocf::heartbeat:IPaddr2): Started node1-pcs
PrivateIP:1 (ocf::heartbeat:IPaddr2): Started node1-pcs
Anybody see what I'm doing wrong? I'm not seeing anything in the logs to
indicate that it tries node2 and then fails; but I'm fairly new to the
software so it's possible I'm not looking in the right place.
Also, I noticed when putting a node in standby the main NIC appears to
be interrupted momentarily (long enough for my SSH session, which is
connected via the permanent IP on the NIC and not the clusterIP, to be
dropped). Is there any way to avoid this? I was thinking that the
cluster operations would only affect the ClusteIP and not the other IPs
being served on that NIC.
Thanks!
Dan
More information about the Users
mailing list