[ClusterLabs] Service pacemaker start kills my cluster and other NFS HA issues

Wed Aug 31 15:31:16 UTC 2016

On 08/30/2016 10:49 AM, Pablo Pines Leon wrote:
> Hello,
> 
> I have set up a DRBD-Corosync-Pacemaker cluster following the
> instructions from https://wiki.ubuntu.com/ClusterStack/Natty adapting
> them to CentOS 7 (e.g: using systemd). After testing it in Virtual

There is a similar how-to specifically for CentOS 7:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html

I think if you compare your configs to that, you'll probably find the
cause. I'm guessing the most important missing pieces are "two_node: 1"
in corosync.conf, and fencing.

> Machines it seemed to be working fine, so it is now implemented in
> physical machines, and I have noticed that the failover works fine as
> long as I kill the master by pulling the AC cable, but not if I issue
> the halt, reboot or shutdown commands, that makes the cluster get in a
> situation like this:
> 
> Last updated: Tue Aug 30 16:55:58 2016          Last change: Tue Aug 23
> 11:49:43 2016 by hacluster via crmd on nfsha2
> Stack: corosync
> Current DC: nfsha2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with
> quorum
> 2 nodes and 9 resources configured
> 
> Online: [ nfsha1 nfsha2 ]
> 
>  Master/Slave Set: ms_drbd_export [res_drbd_export]
>      Masters: [ nfsha2 ]
>      Slaves: [ nfsha1 ]
>  Resource Group: rg_export
>      res_fs     (ocf::heartbeat:Filesystem):    Started nfsha2
>      res_exportfs_export1    (ocf::heartbeat:exportfs):    FAILED nfsha2
> (unmanaged)
>      res_ip     (ocf::heartbeat:IPaddr2):    Stopped
>  Clone Set: cl_nfsserver [res_nfsserver]
>      Started: [ nfsha1 ]
>  Clone Set: cl_exportfs_root [res_exportfs_root]
>      res_exportfs_root  (ocf::heartbeat:exportfs):    FAILED nfsha2
>      Started: [ nfsha1 ]
> 
> Migration Summary:
> * Node 2:
>    res_exportfs_export1: migration-threshold=1000000
> fail-count=1000000    last-failure='Tue Aug 30 16:55:50 2016'
>    res_exportfs_root: migration-threshold=1000000 fail-count=1
> last-failure='Tue Aug 30 16:55:48 2016'
> * Node 1:
> 
> Failed Actions:
> * res_exportfs_export1_stop_0 on nfsha2 'unknown error' (1): call=134,
> status=Timed Out, exitreason='non
> e',
>     last-rc-change='Tue Aug 30 16:55:30 2016', queued=0ms, exec=20001ms
> * res_exportfs_root_monitor_30000 on nfsha2 'not running' (7): call=126,
> status=complete, exitreason='no
> ne',
>     last-rc-change='Tue Aug 30 16:55:48 2016', queued=0ms, exec=0ms
> 
> This of course blocks it, because the IP and the NFS exports are down.
> It doesn't even recognize that the other node is down. I am then forced
> to do "crm_resource -P" to get it back to a working state.
> 
> Even when unplugging the master, and booting it up again, trying to get
> it back in the cluster executing "service pacemaker start" on the node
> that was unplugged will sometimes just cause the exportfs_root resource
> on the slave to fail (but the service is still up):
> 
>  Master/Slave Set: ms_drbd_export [res_drbd_export]
>      Masters: [ nfsha1 ]
>      Slaves: [ nfsha2 ]
>  Resource Group: rg_export
>      res_fs     (ocf::heartbeat:Filesystem):    Started nfsha1
>      res_exportfs_export1    (ocf::heartbeat:exportfs):    Started nfsha1
>      res_ip     (ocf::heartbeat:IPaddr2):    Started nfsha1
>  Clone Set: cl_nfsserver [res_nfsserver]
>      Started: [ nfsha1 nfsha2 ]
>  Clone Set: cl_exportfs_root [res_exportfs_root]
>      Started: [ nfsha1 nfsha2 ]
> 
> Migration Summary:
> * Node nfsha2:
>    res_exportfs_root: migration-threshold=1000000 fail-count=1
> last-failure='Tue Aug 30 17:18:17 2016'
> * Node nfsha1:
> 
> Failed Actions:
> * res_exportfs_root_monitor_30000 on nfsha2 'not running' (7): call=34,
> status=complete, exitreason='non
> e',
>     last-rc-change='Tue Aug 30 17:18:17 2016', queued=0ms, exec=33ms
> 
> BTW I notice that the node attributes are changed:
> 
> Node Attributes:
> * Node nfsha1:
>     + master-res_drbd_export            : 10000
> * Node nfsha2:
>     + master-res_drbd_export            : 1000
> 
> Usually both would have the same weight (10000), so running
> "crm_resource -P" restores that.
> 
> Some other times it will instead cause a service disruption:
> 
> Online: [ nfsha1 nfsha2 ]
> 
>  Master/Slave Set: ms_drbd_export [res_drbd_export]
>      Masters: [ nfsha2 ]
>      Slaves: [ nfsha1 ]
>  Resource Group: rg_export
>      res_fs     (ocf::heartbeat:Filesystem):    Started nfsha2
>      res_exportfs_export1    (ocf::heartbeat:exportfs):    FAILED
> (unmanaged)[ nfsha2 nfsha1 ]
>      res_ip     (ocf::heartbeat:IPaddr2):    Stopped
>  Clone Set: cl_nfsserver [res_nfsserver]
>      Started: [ nfsha1 nfsha2 ]
>  Clone Set: cl_exportfs_root [res_exportfs_root]
>      Started: [ nfsha1 nfsha2]
> 
> Migration Summary:
> * Node nfsha2:
>    res_exportfs_export1: migration-threshold=1000000
> fail-count=1000000    last-failure='Tue Aug 30 17:31:01 2016'
> * Node nfsha1:
>    res_exportfs_export1: migration-threshold=1000000
> fail-count=1000000    last-failure='Tue Aug 30 17:31:01 2016'
>    res_exportfs_root: migration-threshold=1000000 fail-count=1
> last-failure='Tue Aug 30 17:31:11 2016'
> 
> Failed Actions:
> * res_exportfs_export1_stop_0 on nfsha2 'unknown error' (1): call=86,
> status=Timed Out, exitreason='none
> ',
>     last-rc-change='Tue Aug 30 17:30:41 2016', queued=0ms, exec=20002ms
> * res_exportfs_export1_stop_0 on nfsha1 'unknown error' (1): call=32,
> status=Timed Out, exitreason='none
> ',
>     last-rc-change='Tue Aug 30 17:30:41 2016', queued=0ms, exec=20002ms
> * res_exportfs_root_monitor_30000 on nfsha1 'not running' (7): call=29,
> status=complete, exitreason='non
> e',
>     last-rc-change='Tue Aug 30 17:31:11 2016', queued=0ms, exec=0ms
> 
> Then executing "crm_resource -P" brings it back to life, but if that
> command is not executed the cluster remains blocked until after around
> 10 mins when it sometimes gets magically back (like an auto execution of
> crm_resource -P).
> 
> In case it helps, the CRM configuration is this one:
> 
> node 1: nfsha1
> node 2: nfsha2 \
>         attributes standby=off
> primitive res_drbd_export ocf:linbit:drbd \
>         params drbd_resource=export
> primitive res_exportfs_export1 exportfs \
>         params fsid=1 directory="/mnt/export/export1"
> options="rw,root_squash,mountpoint" clientspec="*.0/255.255.255.0"
> wait_for_leasetime_on_stop=true \
>         op monitor interval=30s \
>         meta target-role=Started
> primitive res_exportfs_root exportfs \
>         params fsid=0 directory="/mnt/export" options="rw,crossmnt"
> clientspec="*.0/255.255.255.0" \
>         op monitor interval=30s \
>         meta target-role=Started
> primitive res_fs Filesystem \
>         params device="/dev/drbd0" directory="/mnt/export" fstype=ext3 \
>         meta target-role=Started
> primitive res_ip IPaddr2 \
>         params ip=*.46 cidr_netmask=24 nic=eno1
> primitive res_nfsserver systemd:nfs-server \
>         op monitor interval=30s
> group rg_export res_fs res_exportfs_export1 res_ip
> ms ms_drbd_export res_drbd_export \
>         meta notify=true master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1
> clone cl_exportfs_root res_exportfs_root
> clone cl_nfsserver res_nfsserver
> colocation c_export_on_drbd inf: rg_export ms_drbd_export:Master
> colocation c_nfs_on_root inf: rg_export cl_exportfs_root
> order o_drbd_before_nfs inf: ms_drbd_export:promote rg_export:start
> order o_root_before_nfs inf: cl_exportfs_root rg_export:start
> property cib-bootstrap-options: \
>         maintenance-mode=false \
>         stonith-enabled=false \
>         no-quorum-policy=ignore \
>         have-watchdog=false \
>         dc-version=1.1.13-10.el7_2.4-44eb2dd \
>         cluster-infrastructure=corosync \
>         cluster-name=nfsha
> 
> And the corosync.conf:
> 
> totem {
> version: 2
> # Corosync itself works without a cluster name, but DLM needs one.
> # The cluster name is also written into the VG metadata of newly
> # created shared LVM volume groups, if lvmlockd uses DLM locking.
> # It is also used for computing mcastaddr, unless overridden below.
> cluster_name: nfsha
> # How long before declaring a token lost (ms)
> token: 3000
> # How many token retransmits before forming a new configuration
> token_retransmits_before_loss_const: 10
> # Limit generated nodeids to 31-bits (positive signed integers)
> clear_node_high_bit: yes
> # crypto_cipher and crypto_hash: Used for mutual node authentication.
> # If you choose to enable this, then do remember to create a shared
> # secret with "corosync-keygen".
> # enabling crypto_cipher, requires also enabling of crypto_hash.
> # crypto_cipher and crypto_hash should be used instead of deprecated
> # secauth parameter.
> # Valid values for crypto_cipher are none (no encryption), aes256, aes192,
> # aes128 and 3des. Enabling crypto_cipher, requires also enabling of
> # crypto_hash.
> crypto_cipher: none
> # Valid values for crypto_hash are none (no authentication), md5, sha1,
> # sha256, sha384 and sha512.
> crypto_hash: none
> # Optionally assign a fixed node id (integer)
> # nodeid: 1234
> transport: udpu
> }
> nodelist {
> node {
> ring0_addr: *.50
> nodeid: 1
> }
> node {
> ring0_addr:*.51
> nodeid: 2
> }
> }
> logging {
> to_syslog: yes
> }
> 
> quorum {
> # Enable and configure quorum subsystem (default: off)
> # see also corosync.conf.5 and votequorum.5
> provider: corosync_votequorum
> expected_votes: 2
> }
> 
> So as you can imagine I am really puzzled about all this and would
> certainly welcome any help about what might be wrong with the current
> configuration.
> 
> Thank you very much, kind regards
> 
> Pablo