[ClusterLabs] Service pacemaker start kills my cluster and other NFS HA issues

Tue Sep 6 15:23:38 EDT 2016

On 09/05/2016 05:16 AM, Pablo Pines Leon wrote:
> Hello,
> 
> I implemented the suggested change in corosync and I realized that service pacemaker stop on the master node works provided that I run crm_resource -P from another terminal right after it, and the same goes for the case of the "failback", getting back the node that failed on the cluster, which causes the IP resource and then the NFS exports to fail, if I run crm_resource -P twice after running service pacemaker start to get it back in it will work.
> 
> However, I see no reason why this is happening, if the failover works fine why can there be any problem getting a node back in the cluster?

Looking at your config again, I see only some of your resources have
monitor operations. All primitives should have monitors, except for
master/slave resources which should have two monitors on the m/s
resource, one for master and one for slave (with different intervals).

BTW, crm_resource -P is deprecated in favor of -C. Same thing, just renamed.

> Thanks and kind regards
> 
> Pablo
> ________________________________________
> From: Pablo Pines Leon [pablo.pines.leon at cern.ch]
> Sent: 01 September 2016 09:49
> To: kgaillot at redhat.com; Cluster Labs - All topics      related to open-source clustering welcomed
> Subject: Re: [ClusterLabs] Service pacemaker start kills my cluster and other NFS HA issues
> 
> Dear Ken,
> 
> Thanks for your reply. That configuration in Ubuntu works perfectly fine, the problem is that in CentOS 7 for some reason I am not even able to do a "service pacemaker stop" of the node that is running as master (with the slave off too) because it will have some failed actions that don't make any sense:
> 
> Migration Summary:
> * Node nfsha1:
>    res_exportfs_root: migration-threshold=1000000 fail-count=1 last-failure='Thu
>  Sep  1 09:42:43 2016'
>    res_exportfs_export1: migration-threshold=1000000 fail-count=1000000 last-fai
> lure='Thu Sep  1 09:42:38 2016'
> 
> Failed Actions:
> * res_exportfs_root_monitor_30000 on nfsha1 'not running' (7): call=79, status=c
> omplete, exitreason='none',
>     last-rc-change='Thu Sep  1 09:42:43 2016', queued=0ms, exec=0ms
> * res_exportfs_export1_stop_0 on nfsha1 'unknown error' (1): call=88, status=Tim
> ed Out, exitreason='none',
>     last-rc-change='Thu Sep  1 09:42:18 2016', queued=0ms, exec=20001ms
> 
> So I am wondering what is different between both OSes that will cause this different outcome.
> 
> Kind regards
> 
> ________________________________________
> From: Ken Gaillot [kgaillot at redhat.com]
> Sent: 31 August 2016 17:31
> To: users at clusterlabs.org
> Subject: Re: [ClusterLabs] Service pacemaker start kills my cluster and other NFS HA issues
> 
> On 08/30/2016 10:49 AM, Pablo Pines Leon wrote:
>> Hello,
>>
>> I have set up a DRBD-Corosync-Pacemaker cluster following the
>> instructions from https://wiki.ubuntu.com/ClusterStack/Natty adapting
>> them to CentOS 7 (e.g: using systemd). After testing it in Virtual
> 
> There is a similar how-to specifically for CentOS 7:
> 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html
> 
> I think if you compare your configs to that, you'll probably find the
> cause. I'm guessing the most important missing pieces are "two_node: 1"
> in corosync.conf, and fencing.
> 
> 
>> Machines it seemed to be working fine, so it is now implemented in
>> physical machines, and I have noticed that the failover works fine as
>> long as I kill the master by pulling the AC cable, but not if I issue
>> the halt, reboot or shutdown commands, that makes the cluster get in a
>> situation like this:
>>
>> Last updated: Tue Aug 30 16:55:58 2016          Last change: Tue Aug 23
>> 11:49:43 2016 by hacluster via crmd on nfsha2
>> Stack: corosync
>> Current DC: nfsha2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with
>> quorum
>> 2 nodes and 9 resources configured
>>
>> Online: [ nfsha1 nfsha2 ]
>>
>>  Master/Slave Set: ms_drbd_export [res_drbd_export]
>>      Masters: [ nfsha2 ]
>>      Slaves: [ nfsha1 ]
>>  Resource Group: rg_export
>>      res_fs     (ocf::heartbeat:Filesystem):    Started nfsha2
>>      res_exportfs_export1    (ocf::heartbeat:exportfs):    FAILED nfsha2
>> (unmanaged)
>>      res_ip     (ocf::heartbeat:IPaddr2):    Stopped
>>  Clone Set: cl_nfsserver [res_nfsserver]
>>      Started: [ nfsha1 ]
>>  Clone Set: cl_exportfs_root [res_exportfs_root]
>>      res_exportfs_root  (ocf::heartbeat:exportfs):    FAILED nfsha2
>>      Started: [ nfsha1 ]
>>
>> Migration Summary:
>> * Node 2:
>>    res_exportfs_export1: migration-threshold=1000000
>> fail-count=1000000    last-failure='Tue Aug 30 16:55:50 2016'
>>    res_exportfs_root: migration-threshold=1000000 fail-count=1
>> last-failure='Tue Aug 30 16:55:48 2016'
>> * Node 1:
>>
>> Failed Actions:
>> * res_exportfs_export1_stop_0 on nfsha2 'unknown error' (1): call=134,
>> status=Timed Out, exitreason='non
>> e',
>>     last-rc-change='Tue Aug 30 16:55:30 2016', queued=0ms, exec=20001ms
>> * res_exportfs_root_monitor_30000 on nfsha2 'not running' (7): call=126,
>> status=complete, exitreason='no
>> ne',
>>     last-rc-change='Tue Aug 30 16:55:48 2016', queued=0ms, exec=0ms
>>
>> This of course blocks it, because the IP and the NFS exports are down.
>> It doesn't even recognize that the other node is down. I am then forced
>> to do "crm_resource -P" to get it back to a working state.
>>
>> Even when unplugging the master, and booting it up again, trying to get
>> it back in the cluster executing "service pacemaker start" on the node
>> that was unplugged will sometimes just cause the exportfs_root resource
>> on the slave to fail (but the service is still up):
>>
>>  Master/Slave Set: ms_drbd_export [res_drbd_export]
>>      Masters: [ nfsha1 ]
>>      Slaves: [ nfsha2 ]
>>  Resource Group: rg_export
>>      res_fs     (ocf::heartbeat:Filesystem):    Started nfsha1
>>      res_exportfs_export1    (ocf::heartbeat:exportfs):    Started nfsha1
>>      res_ip     (ocf::heartbeat:IPaddr2):    Started nfsha1
>>  Clone Set: cl_nfsserver [res_nfsserver]
>>      Started: [ nfsha1 nfsha2 ]
>>  Clone Set: cl_exportfs_root [res_exportfs_root]
>>      Started: [ nfsha1 nfsha2 ]
>>
>> Migration Summary:
>> * Node nfsha2:
>>    res_exportfs_root: migration-threshold=1000000 fail-count=1
>> last-failure='Tue Aug 30 17:18:17 2016'
>> * Node nfsha1:
>>
>> Failed Actions:
>> * res_exportfs_root_monitor_30000 on nfsha2 'not running' (7): call=34,
>> status=complete, exitreason='non
>> e',
>>     last-rc-change='Tue Aug 30 17:18:17 2016', queued=0ms, exec=33ms
>>
>> BTW I notice that the node attributes are changed:
>>
>> Node Attributes:
>> * Node nfsha1:
>>     + master-res_drbd_export            : 10000
>> * Node nfsha2:
>>     + master-res_drbd_export            : 1000
>>
>> Usually both would have the same weight (10000), so running
>> "crm_resource -P" restores that.
>>
>> Some other times it will instead cause a service disruption:
>>
>> Online: [ nfsha1 nfsha2 ]
>>
>>  Master/Slave Set: ms_drbd_export [res_drbd_export]
>>      Masters: [ nfsha2 ]
>>      Slaves: [ nfsha1 ]
>>  Resource Group: rg_export
>>      res_fs     (ocf::heartbeat:Filesystem):    Started nfsha2
>>      res_exportfs_export1    (ocf::heartbeat:exportfs):    FAILED
>> (unmanaged)[ nfsha2 nfsha1 ]
>>      res_ip     (ocf::heartbeat:IPaddr2):    Stopped
>>  Clone Set: cl_nfsserver [res_nfsserver]
>>      Started: [ nfsha1 nfsha2 ]
>>  Clone Set: cl_exportfs_root [res_exportfs_root]
>>      Started: [ nfsha1 nfsha2]
>>
>> Migration Summary:
>> * Node nfsha2:
>>    res_exportfs_export1: migration-threshold=1000000
>> fail-count=1000000    last-failure='Tue Aug 30 17:31:01 2016'
>> * Node nfsha1:
>>    res_exportfs_export1: migration-threshold=1000000
>> fail-count=1000000    last-failure='Tue Aug 30 17:31:01 2016'
>>    res_exportfs_root: migration-threshold=1000000 fail-count=1
>> last-failure='Tue Aug 30 17:31:11 2016'
>>
>> Failed Actions:
>> * res_exportfs_export1_stop_0 on nfsha2 'unknown error' (1): call=86,
>> status=Timed Out, exitreason='none
>> ',
>>     last-rc-change='Tue Aug 30 17:30:41 2016', queued=0ms, exec=20002ms
>> * res_exportfs_export1_stop_0 on nfsha1 'unknown error' (1): call=32,
>> status=Timed Out, exitreason='none
>> ',
>>     last-rc-change='Tue Aug 30 17:30:41 2016', queued=0ms, exec=20002ms
>> * res_exportfs_root_monitor_30000 on nfsha1 'not running' (7): call=29,
>> status=complete, exitreason='non
>> e',
>>     last-rc-change='Tue Aug 30 17:31:11 2016', queued=0ms, exec=0ms
>>
>> Then executing "crm_resource -P" brings it back to life, but if that
>> command is not executed the cluster remains blocked until after around
>> 10 mins when it sometimes gets magically back (like an auto execution of
>> crm_resource -P).
>>
>> In case it helps, the CRM configuration is this one:
>>
>> node 1: nfsha1
>> node 2: nfsha2 \
>>         attributes standby=off
>> primitive res_drbd_export ocf:linbit:drbd \
>>         params drbd_resource=export
>> primitive res_exportfs_export1 exportfs \
>>         params fsid=1 directory="/mnt/export/export1"
>> options="rw,root_squash,mountpoint" clientspec="*.0/255.255.255.0"
>> wait_for_leasetime_on_stop=true \
>>         op monitor interval=30s \
>>         meta target-role=Started
>> primitive res_exportfs_root exportfs \
>>         params fsid=0 directory="/mnt/export" options="rw,crossmnt"
>> clientspec="*.0/255.255.255.0" \
>>         op monitor interval=30s \
>>         meta target-role=Started
>> primitive res_fs Filesystem \
>>         params device="/dev/drbd0" directory="/mnt/export" fstype=ext3 \
>>         meta target-role=Started
>> primitive res_ip IPaddr2 \
>>         params ip=*.46 cidr_netmask=24 nic=eno1
>> primitive res_nfsserver systemd:nfs-server \
>>         op monitor interval=30s
>> group rg_export res_fs res_exportfs_export1 res_ip
>> ms ms_drbd_export res_drbd_export \
>>         meta notify=true master-max=1 master-node-max=1 clone-max=2
>> clone-node-max=1
>> clone cl_exportfs_root res_exportfs_root
>> clone cl_nfsserver res_nfsserver
>> colocation c_export_on_drbd inf: rg_export ms_drbd_export:Master
>> colocation c_nfs_on_root inf: rg_export cl_exportfs_root
>> order o_drbd_before_nfs inf: ms_drbd_export:promote rg_export:start
>> order o_root_before_nfs inf: cl_exportfs_root rg_export:start
>> property cib-bootstrap-options: \
>>         maintenance-mode=false \
>>         stonith-enabled=false \
>>         no-quorum-policy=ignore \
>>         have-watchdog=false \
>>         dc-version=1.1.13-10.el7_2.4-44eb2dd \
>>         cluster-infrastructure=corosync \
>>         cluster-name=nfsha
>>
>> And the corosync.conf:
>>
>> totem {
>> version: 2
>> # Corosync itself works without a cluster name, but DLM needs one.
>> # The cluster name is also written into the VG metadata of newly
>> # created shared LVM volume groups, if lvmlockd uses DLM locking.
>> # It is also used for computing mcastaddr, unless overridden below.
>> cluster_name: nfsha
>> # How long before declaring a token lost (ms)
>> token: 3000
>> # How many token retransmits before forming a new configuration
>> token_retransmits_before_loss_const: 10
>> # Limit generated nodeids to 31-bits (positive signed integers)
>> clear_node_high_bit: yes
>> # crypto_cipher and crypto_hash: Used for mutual node authentication.
>> # If you choose to enable this, then do remember to create a shared
>> # secret with "corosync-keygen".
>> # enabling crypto_cipher, requires also enabling of crypto_hash.
>> # crypto_cipher and crypto_hash should be used instead of deprecated
>> # secauth parameter.
>> # Valid values for crypto_cipher are none (no encryption), aes256, aes192,
>> # aes128 and 3des. Enabling crypto_cipher, requires also enabling of
>> # crypto_hash.
>> crypto_cipher: none
>> # Valid values for crypto_hash are none (no authentication), md5, sha1,
>> # sha256, sha384 and sha512.
>> crypto_hash: none
>> # Optionally assign a fixed node id (integer)
>> # nodeid: 1234
>> transport: udpu
>> }
>> nodelist {
>> node {
>> ring0_addr: *.50
>> nodeid: 1
>> }
>> node {
>> ring0_addr:*.51
>> nodeid: 2
>> }
>> }
>> logging {
>> to_syslog: yes
>> }
>>
>> quorum {
>> # Enable and configure quorum subsystem (default: off)
>> # see also corosync.conf.5 and votequorum.5
>> provider: corosync_votequorum
>> expected_votes: 2
>> }
>>
>> So as you can imagine I am really puzzled about all this and would
>> certainly welcome any help about what might be wrong with the current
>> configuration.
>>
>> Thank you very much, kind regards
>>
>> Pablo