[ClusterLabs] FW: FW: ocf_heartbeat_IPaddr2 - Real MAC of interface is revealed

Fri Mar 16 16:32:33 EDT 2018

I have tried this in a new, clean environment, and the result is the same. Is there no one who has had this issue!?
Maybe I'm searching the wrong terms, but I keep coming up empty.

Mit freundlichen Grüßen / With best regards

Andreas Iwanowski- IT Administrator / Software Developer
www.awato.de |namezero at afim.info
T:+49 2133 26031 55 | F: +49 (0)2133 26031 01
awato Software GmbH | Salm Reifferscheidt Allee 37 | D-41540 Dormagen

avisor-Support | T: +49 (0)621 6094 043 | F: +49 (0)621 6071 447

Geschäftsführer: Ursula Iwanowski | HRB: Neuss 7208 | VAT-no.: DE 122796158

-----Original Message-----
From: Andreas M. Iwanowski
Sent: Wednesday, 14 March, 2018 12:45
To: 'Cluster Labs - All topics related to open-source clustering welcomed'
Subject: RE: [ClusterLabs] FW: ocf_heartbeat_IPaddr2 - Real MAC of interface is revealed

Thank you Andrei, and apologies for being unclear: offline in this example was supposed to mean stopped for maintenance, i.e. with pcs cluster stop.

So, basically, here is what's going on:
VIP 172.16.16.9; mac = 11:54:33:a8:b2:6b
redmine1 172.16.16.10, if mac = 00:0c:29:8e:0c:a4
redmine2 172.16.16.11 if mac = 00:0c:29:96:9c:c6

1. Both nodes online, as pcs status shows
[root at redmine2 ~]# pcs status
[...]
2 nodes configured
2 resources configured

Online: [ redmine1 redmine2 ]

Full list of resources:

 Clone Set: RedmineIP-clone [RedmineIP] (unique)
     RedmineIP:0        (ocf::heartbeat:IPaddr2):       Started redmine1
     RedmineIP:1        (ocf::heartbeat:IPaddr2):       Started redmine2

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ARP entry:
? (172.16.16.9) at 11:54:33:a8:b2:6b on re1_vlan6 expires in 1197 seconds

Everything correct here.

2. redmine1 is stopped with pcs cluster stop 172.16.16.10; pcs status shows

[root at redmine2 ~]# pcs status
[...]
2 nodes configured
2 resources configured

Online: [ redmine2 ]
OFFLINE: [ redmine1 ]

Full list of resources:

 Clone Set: RedmineIP-clone [RedmineIP] (unique)
     RedmineIP:0        (ocf::heartbeat:IPaddr2):       Started redmine2
     RedmineIP:1        (ocf::heartbeat:IPaddr2):       Started redmine2

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Failover worked, both resources serviced by second host.
However, target now learned redmine2's max for VIP:
ARP entry:
? (172.16.16.9) at 00:0c:29:96:9c:c6 on re1_vlan6 expires in 1155 seconds

So far not "dangerous", as all IPs are serviced by redmine2 anyway.

3. But now, after failback via pcs cluster start 172.16.16.10:
[root at redmine2 ~]# pcs status
[...]
2 nodes configured
2 resources configured

Online: [ redmine1 redmine2 ]

Full list of resources:

 Clone Set: RedmineIP-clone [RedmineIP] (unique)
     RedmineIP:0        (ocf::heartbeat:IPaddr2):       Started redmine2
     RedmineIP:1        (ocf::heartbeat:IPaddr2):       Started redmine1

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

ARP Entry: ? (172.16.16.9) at 00:0c:29:8e:0c:a4 on re1_vlan6 expires in 1184 seconds
For some reason, the VIP now resolves to only redmine1 instead of Multicast MAC.
If the host should be serviced by redmine2 (through clusterip_hash=sourceip), then the VIP becomes unreachable!

So, wouldn't the correct behavior be to always maintain the Multicast MAC?

-----Original Message-----
From: Users [mailto:users-bounces at clusterlabs.org] On Behalf Of Andrei Borzenkov
Sent: Wednesday, 14 March, 2018 8:01
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] FW: ocf_heartbeat_IPaddr2 - Real MAC of interface is revealed

On Wed, Mar 14, 2018 at 12:40 AM, Andreas M. Iwanowski <namezero at afim.info> wrote:
> Dear folks,
>
> We are currently trying to set up a multimaster cluster and use a cloned ocf_heartbeat_IPaddr2 resource to share the IP address.
>
> We have, however, run into a problem that, when a cluster member is taken offline, the MAC for the IP address changes from the multicast-MAC to the interface mac of the remaining host.
> When the other host is put pack online, pings to the cluster IP time out when it changes back to multicast (until the ARP cache on the router expires).
>

What exactly offline means? Host failure? You put node in standby in pacemaker? When MAC changes - immediately or after host/cluster restart?

> Is there any way to prevent network devices from learning the interface MACs? I.e. even if one host is servicing both resources, use the multicast MAC?
> Any help would be appreciated!
>
> Here is the pcs status:
> ===========================
> Cluster name: test_svc
> WARNING: corosync and pacemaker node names do not match (IPs used in
> setup?)
> Stack: corosync
> Current DC: host1 (version 1.1.16-12.el7_4.8-94ff4df) - partition with
> quorum Last updated: Tue Mar 13 07:12:07 2018 Last change: Sun Mar 11
> 17:17:04 2018 by hacluster via crmd on host1
>
> 2 nodes configured
> 2 resources configured
>
> Online: [ host1 host2 ]
>

I guess output when one host is "offline" would be needed here.

> Full list of resources:
>
>  Clone Set: RedmineIP-clone [RedmineIP] (unique)
>      RedmineIP:0        (ocf::heartbeat:IPaddr2):       Started host1
>      RedmineIP:1        (ocf::heartbeat:IPaddr2):       Started host2
>
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
> ===========================
>
_______________________________________________
Users mailing list: Users at clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org