[ClusterLabs] Problems with corosync and pacemaker with error scenarios

Mon Jan 16 15:43:15 UTC 2017

On 01/16/2017 08:56 AM, Gerhard Wiesinger wrote:
> Hello,
> 
> I'm new to corosync and pacemaker and I want to setup a nginx cluster
> with quorum.
> 
> Requirements:
> - 3 Linux maschines
> - On 2 maschines floating IP should be handled and nginx as a load
> balancing proxy
> - 3rd maschine is for quorum only, no services must run there
> 
> Installed on all 3 nodes corosync/pacemaker, firewall ports openend are:
> 5404, 5405, 5406 for udp in both directions

If you're using firewalld, the easiest configuration is:

  firewall-cmd --permanent --add-service=high-availability

If not, depending on what you're running, you may also want to open  TCP
ports 2224 (pcsd), 3121 (Pacemaker Remote), and 21064 (DLM).

> OS: Fedora 25
> 
> Configuration of corosync (only the bindnetaddr is different on every
> maschine) and pacemaker below.

FYI you don't need a different bindnetaddr. You can (and generally
should) use the *network* address, which is the same on all hosts.

> Configuration works so far but error test scenarios don't work like
> expected:
> 1.) I had cases in testing without qourum and quorum again where the
> cluster kept in Stopped state
>   I had to restart the whole stack to get it online again (killall -9
> corosync;systemctl restart corosync;systemctl restart pacemaker)
>   Any ideas?

It will be next to impossible to say without logs. It's definitely not
expected behavior. Stopping is the correct response to losing quorum;
perhaps quorum is not being properly restored for some reason. What is
your test methodology?

> 2.) Restarting pacemaker on inactive node also restarts resources on the
> other active node:
> a.) Everything up & ok
> b.) lb01 handles all resources
> c.) on lb02 which handles no resrouces: systemctl restart pacemaker:
>   All resources will also be restart with a short outage on lb01 (state
> is Stopped, Started[ lb01 lb02 ] and then Started lb02)
>   How can this be avoided?

This is not expected behavior, except with clones, which I don't see you
using.

> 3.) Stopping and starting corosync doesn't awake the node up again:
>   systemctl stop corosync;sleep 10;systemctl restart corosync
>   Online: [ kvm01 lb01 ]
>   OFFLINE: [ lb02 ]
>   Stays in that state until pacemaker is restarted: systemctl restart
> pacemaker
>   Bug?

No, pacemaker should always restart if corosync restarts. That is
specified in the systemd units, so I'm not sure why pacemaker didn't
automatically restart in your case.

> 4.) "systemctl restart corosync" hangs sometimes (waiting 2 min)
>   needs a
>   killall -9 corosync;systemctl restart corosync;systemctl restart
> pacemaker
>   sequence to get it up gain
> 
> 5.) Simulation of split brain: Disabling/reenabling local firewall
> (ports 5404, 5405, 5406) on node lb01 and lb02 for the following ports

FYI for an accurate simulation, be sure to block both incoming and
outgoing traffic on the corosync ports.

> doesn't bring corosync up again after reenabling lb02 firewall
> partition WITHOUT quorum
> Online: [ kvm01 ]
> OFFLINE: [ lb01 lb02 ]
>   NOK: restart on lb02: systemctl restart corosync;systemctl restart
> pacemaker
>   OK:  restart on lb02 and kvm01 (quorum host): systemctl restart
> corosync;systemctl restart pacemaker
>   I also see that non enabled hosts (quorum hosts) are also tried to be
> started on kvm01
>   Started[ kvm01 lb02 ]
>   Started lb02
>   Any ideas?
> 
> I've also written a new ocf:heartbeat:Iprule to modify "ip rule"
> accordingly.
> 
> Versions are:
> corosync: 2.4.2
> pacemaker: 1.1.16
> Kernel: 4.9.3-200.fc25.x86_64
> 
> Thnx.
> 
> Ciao,
> Gerhard
> 
> Corosync config:
> ================================================================================================================================================================
> 
> totem {
>         version: 2
>         cluster_name: lbcluster
>         crypto_cipher: aes256
>         crypto_hash: sha512
>         interface {
>                 ringnumber: 0
>                 bindnetaddr: 1.2.3.35
>                 mcastport: 5405
>         }
>         transport: udpu
> }
> logging {
>         fileline: off
>         to_logfile: yes
>         to_syslog: yes
>         logfile: /var/log/cluster/corosync.log
>         debug: off
>         timestamp: on
>         logger_subsys {
>                 subsys: QUORUM
>                 debug: off
>         }
> }
> nodelist {
>         node {
>                 ring0_addr: lb01
>                 nodeid: 1
>         }
>         node {
>                 ring0_addr: lb02
>                 nodeid: 2
>         }
>         node {
>                 ring0_addr: kvm01
>                 nodeid: 3
>         }
> }
> quorum {
>         # Enable and configure quorum subsystem (default: off)
>         # see also corosync.conf.5 and votequorum.5
>         #provider: corosync_votequorum
>         provider: corosync_votequorum
>         # Only for 2 node setup!
>         #  two_node: 1
> }
> ================================================================================================================================================================
> 
> ################################################################################################################################################################
> 
> # Default properties
> ################################################################################################################################################################
> 
> pcs property set stonith-enabled=false

FYI fencing is the only way to recover from certain failure scenarios,
so be aware you'll have problems if those happen.

E.g. if one of the lb's experiences crippling CPU or I/O load, it will
be unable to function as a member of the cluster (including stopping
resources), but the cluster will be unable to recover resources
elsewhere because it can't be sure they are not still active.

> pcs property set no-quorum-policy=stop
> pcs property set default-resource-stickiness=100
> pcs property set symmetric-cluster=false
> ################################################################################################################################################################
> 
> # Delete & cleanup resources
> ################################################################################################################################################################
> 
> pcs resource delete webserver
> pcs resource cleanup webserver
> pcs resource delete ClusterIP_01
> pcs resource cleanup ClusterIP_01
> pcs resource delete ClusterIPRoute_01
> pcs resource cleanup ClusterIPRoute_01
> pcs resource delete ClusterIPRule_01
> pcs resource cleanup ClusterIPRule_01
> pcs resource delete ClusterIP_02
> pcs resource cleanup ClusterIP_02
> pcs resource delete ClusterIPRoute_02
> pcs resource cleanup ClusterIPRoute_02
> pcs resource delete ClusterIPRule_02
> pcs resource cleanup ClusterIPRule_02
> ################################################################################################################################################################
> 
> # Create resources
> ################################################################################################################################################################
> 
> pcs resource create ClusterIP_01 ocf:heartbeat:IPaddr2 ip=1.2.3.81
> nic=eth1 cidr_netmask=28 broadcast=1.2.3.95 iflabel=1 meta
> migration-threshold=2 op monitor timeout=20s interval=10s
> on-fail=restart --group ClusterNetworking
> pcs resource create ClusterIPRoute_01 ocf:heartbeat:Route params
> device=eth1 source=1.2.3.81 destination=default gateway=1.2.3.94
> table=125 meta migration-threshold=2 op monitor timeout=20s interval=10s
> on-fail=restart --group ClusterNetworking --after ClusterIP_01
> pcs resource create ClusterIPRule_01 ocf:heartbeat:Iprule params
> from=1.2.3.81 table=125 meta migration-threshold=2 op monitor
> timeout=20s interval=10s on-fail=restart --group ClusterNetworking
> --after ClusterIPRoute_01
> pcs constraint location ClusterIP_01 prefers lb01=INFINITY
> pcs constraint location ClusterIP_01 prefers lb02=INFINITY
> pcs constraint location ClusterIPRoute_01 prefers lb01=INFINITY
> pcs constraint location ClusterIPRoute_01 prefers lb02=INFINITY
> pcs constraint location ClusterIPRule_01 prefers lb01=INFINITY
> pcs constraint location ClusterIPRule_01 prefers lb02=INFINITY
> pcs resource create ClusterIP_02 ocf:heartbeat:IPaddr2 ip=1.2.3.82
> nic=eth1 cidr_netmask=28 broadcast=1.2.3.95 iflabel=2 meta
> migration-threshold=2 op monitor timeout=20s interval=10s
> on-fail=restart --group ClusterNetworking
> pcs resource create ClusterIPRoute_02 ocf:heartbeat:Route params
> device=eth1 source=1.2.3.82 destination=default gateway=1.2.3.94
> table=126 meta migration-threshold=2 op monitor timeout=20s interval=10s
> on-fail=restart --group ClusterNetworking --after ClusterIP_02
> pcs resource create ClusterIPRule_02 ocf:heartbeat:Iprule params
> from=1.2.3.82 table=126 meta migration-threshold=2 op monitor
> timeout=20s interval=10s on-fail=restart --group ClusterNetworking
> --after ClusterIPRoute_02
> pcs constraint location ClusterIP_02 prefers lb01=INFINITY
> pcs constraint location ClusterIP_02 prefers lb02=INFINITY
> pcs constraint location ClusterIPRoute_02 prefers lb01=INFINITY
> pcs constraint location ClusterIPRoute_02 prefers lb02=INFINITY
> pcs constraint location ClusterIPRule_02 prefers lb01=INFINITY
> pcs constraint location ClusterIPRule_02 prefers lb02=INFINITY
> ################################################################################################################################################################
> 
> # NGINX
> ################################################################################################################################################################
> 
> pcs resource create webserver ocf:heartbeat:nginx httpd=/usr/sbin/nginx
> configfile=/etc/nginx/nginx.conf meta migration-threshold=2 op monitor
> timeout=5s interval=5s on-fail=restart
> pcs constraint colocation add webserver with ClusterNetworking INFINITY
> pcs constraint order ClusterNetworking then webserver
> pcs constraint location webserver prefers lb01=INFINITY
> pcs constraint location webserver prefers lb02=INFINITY
> ================================================================================================================================================================