[ClusterLabs] EL6, cman, rrp, unicast and iptables

Sat Sep 12 01:01:58 UTC 2015

On 11/09/15 02:40 PM, Noel Kuntze wrote:
> 
> Hello,
> 
> Personally, I would set filters in tc (packet priorization/shaper part of the network stack)
> to prioritize Cluster packets. That way, delivery of the packets is basicly guaranteed. I'd do something similiar
> on the switch to make sure it prioritizes the packets, too.
> 
> The migration traffic must always have a lower priority than the traffic if cman and the other components,
> so the totems get delivered in any case.
> The default queuing behaviour is FIFO. This is obviously not desireable here.
> If the preset queuing mechanism supports traffic priorization, you can possible get away with
> a single set of iptables DSCP/TOS rules in *mangle OUTPUT to set the correct value
> so the queue prioritizes it.

So back when I was designing the initial Anvil!, I had a conversation
about this with one of the core devs. His take on it was that QoS causes
more problems than it solves. That was what was in my mind when I
decided on RRP instead of QoS.

I am a strong believer in "keep it as simple as possible". In a case
like this, it's hard to argue that any option is simple, but given that
RRP is baked into the HA stack, I decided to trust it over QoS. I am
perfectly open to contrary ideas though. Can you help me understand why
you think tc's additional complexity is worth it? I'm willing to fully
believe it is, but I want to understand the pros and cons first.

As an aside;

I've now got the cluster running in either UDP-unicast or UDP-multicast
(wanted to leave my options open) with the following iptables rules;

(note: 10.20/16 = BCN, 10.10/16 = SN, 192.168.199/24 = IFN)

====
# Generated by iptables-save v1.4.7 on Sat Sep 12 01:00:09 2015
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [300170:21524114]
-A INPUT -s 10.10.0.0/16 -p udp -m addrtype --dst-type MULTICAST -m
conntrack --ctstate NEW -m multiport --dports 5404,5405 -j ACCEPT
-A INPUT -s 10.20.0.0/16 -p udp -m addrtype --dst-type MULTICAST -m
conntrack --ctstate NEW -m multiport --dports 5404,5405 -j ACCEPT
-A INPUT -p tcp -m conntrack --ctstate NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -s 10.20.0.0/16 -p sctp -j ACCEPT
-A INPUT -s 10.10.0.0/16 -p sctp -j ACCEPT
-A INPUT -s 10.20.0.0/16 -d 10.20.0.0/16 -p udp -m conntrack --ctstate
NEW -m multiport --dports 5404,5405 -j ACCEPT
-A INPUT -s 10.10.0.0/16 -d 10.10.0.0/16 -p udp -m conntrack --ctstate
NEW -m multiport --dports 5404,5405 -j ACCEPT
-A INPUT -s 10.20.0.0/16 -d 10.20.0.0/16 -p tcp -m conntrack --ctstate
NEW -m multiport --dports
123,5800,5900:5999,11111,16851,21064,49152:49216 -j ACCEPT
-A INPUT -s 10.10.0.0/16 -d 10.10.0.0/16 -p tcp -m conntrack --ctstate
NEW -m multiport --dports 7788:7799,11111,16851,21064 -j ACCEPT
-A INPUT -s 192.168.122.0/24 -d 192.168.122.0/24 -p tcp -m conntrack
--ctstate NEW -m multiport --dports 123,5800,5900:5999 -j ACCEPT
-A INPUT -p igmp -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
# Completed on Sat Sep 12 01:00:09 2015
====

In both unicast and multicast mode, I still see:

====
Sep 12 00:01:24 node2 corosync[26991]:   [TOTEM ] ring 0 active with no
faults
Sep 12 00:01:36 node2 corosync[26991]:   [TOTEM ] Incrementing problem
counter for seqid 546 iface 10.20.10.2 to [1 of 3]
Sep 12 00:01:38 node2 corosync[26991]:   [TOTEM ] ring 0 active with no
faults
Sep 12 00:01:42 node2 corosync[26991]:   [TOTEM ] Incrementing problem
counter for seqid 548 iface 10.20.10.2 to [1 of 3]
Sep 12 00:01:44 node2 corosync[26991]:   [TOTEM ] ring 0 active with no
faults
Sep 12 00:01:49 node2 corosync[26991]:   [TOTEM ] Incrementing problem
counter for seqid 550 iface 10.20.10.2 to [1 of 3]
Sep 12 00:01:51 node2 corosync[26991]:   [TOTEM ] ring 0 active with no
faults
Sep 12 00:01:56 node2 corosync[26991]:   [TOTEM ] Incrementing problem
counter for seqid 552 iface 10.20.10.2 to [1 of 3]
Sep 12 00:01:58 node2 corosync[26991]:   [TOTEM ] ring 0 active with no
faults
Sep 12 00:02:02 node2 corosync[26991]:   [TOTEM ] Incrementing problem
counter for seqid 554 iface 10.20.10.2 to [1 of 3]
Sep 12 00:02:04 node2 corosync[26991]:   [TOTEM ] ring 0 active with no
faults
Sep 12 00:02:09 node2 corosync[26991]:   [TOTEM ] Incrementing problem
counter for seqid 556 iface 10.20.10.2 to [1 of 3]
Sep 12 00:02:11 node2 corosync[26991]:   [TOTEM ] ring 0 active with no
faults
====

When I fail the BCN (ring 0). This worries me, though the cluster never
breaks.

Thanks again!

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?