[ClusterLabs] Making xt_cluster IP load-sharing work with IPv6 (Was: Concept of a Shared ipaddress/resource for generic applicatons)[

Fri Jan 3 17:42:59 EST 2020

On Thu, Jan 02, 2020 at 09:52:09PM +0100, Jan Pokorný wrote:
> What you've used appears to be akin to what this chunk of manpage
> suggests (amongst others):
> https://git.netfilter.org/iptables/tree/extensions/libxt_cluster.man
> 
> which is (yet another) indicator to me that xt_cluster extension
> doesn't carry that functionality on its own (like CLUSTERIP target
> did, as mentioned).

Right, the old ipt_CLUSTERIP.c (908 lines) was a complete solution
with ARP mangling and /proc file to control node mapping. The new
xt_cluster.c (175 lines) is much more limited and only handles the
hashing part. The rest needs to be done externally, using iptables
and arptables commands (or some nft equivalents).

> 2. Is the following, for me viable explanation correct?
> 
> That arrangement is to prevent here unexpectedly leaky specific
> associations (I'd call "fixations") of the interface's true (hence
> non-multicast) MAC address with meant-to-be-shared IP address at hand,
> and hence cancelling the effect of link-multicasted frames (to which
> at most a single recipient would respond per the firewall matching
> rules), and therefore botching the "shared IP" concept altogether from
> the perspective of network members that would undesirably learn
> non-multicast address association for the particular
> meant-to-be-shared IP leaked like this.

Yes, I added VIP to both nodes as a normal address and they would both
reply to ARP requests with their normal MAC addresses. The arptables
command rewrites the ARP reply to use the multicast MAC for VIP instead.

> * But it doesn't explain the suggested destination MAC renormalization
> * on INPUT, which is currently yet to be heard of for our purpose...

I did not use the INPUT rules from the xt_cluster documentation and
to be honest don't understand the setup described there.

> 4. Shall not even existing IPaddr2 (whether in CLUSTERIP-based mode
>    or not) actually verify that
>    /proc/sys/net/netfilter/nf_conntrack_tcp_loose
>    gets cleared, at least until told not to through configuration?
> 
> - looks like a good idea not to allow any after-cut packets
>   interaction (would only apply to anything outside of the
>   critical cluster infrastructure since it uses UDP), as
>   a matter of safety precautions (there are no liveness
>   aspects to wish for in such scenarios, which could
>   otherwise interfere, I think)

Did not use conntrack so not sure how important this setting is.

> 5. Here, I had a closer look at the code as well and have an option
>    to try -- does this help?
> 
> It appears as if that response in the (solicited)  Neighbour
> Advertisement is -- in Linux kernel -- unconditionally always
> picked from the very first address configured on the device (not to
> be confused with "permanent address").  Hence it looks to me that
> the way to go would be, so as to achieve feature parity IPv4 vs. IPv6,
> to either:
> 
> - give up on the sole identity of the interface, so that it either
>   operates under selected multicast link layer address or doesn't
>   operate at all (rationale: better not to confuse the network with
>   occasional MAC flips?)
> 
> - stick with a new macvlan pseudointerface, surprise-surprise, yet
>   another virtualization/mimicking/independence-increasing layer :-)
> 
> No experience with macvlan on my side, but bridge mode looks appealing,
> and would retain the interface addressable through its standard MAC
> address as well.  And importantly, the newly created interface would
> have the correct (multicast) MAC address to respond with to the
> respective Neighbour Solicitations (which is exactly what's asked,
> IIUIC), and I expect it would be the one selected to respond to
> the very matching IP in question?
> 
> Still, this doesn't resolve any concern around point 3. above
> (assuming it's not bogus, to begin with).

Yes, macvlan or some other similar trick to bind the VIP with multicast
MAC could be used here to avoid the whole packet rewrite mess. The only
iptables rules required in this setup would be to select using
xt_cluster if the incoming packet is to be handled by the current node
or just ignored (DROP).

-- 
Valentin