[ClusterLabs] [EXTERNAL] Re: "node is unclean" leads to gratuitous reboot

Wed Jul 10 14:15:56 EDT 2019

Thanks to you and Andrei for your responses.  In our particular situation, we want to be able to operate with either node in stand-alone mode, or with both nodes protected by HA.  I did not mention this, but I am working on upgrading our product from a version which used Pacemaker version 1.0.13 and Heartbeat to run under CentOS 7.6 (later 8.0).  The older version did not exhibit this behavior, hence my concern.

I do understand the "wait_for_all" option better, and now that I know why the "gratuitous" reboot is happening, I'm more comfortable with that behavior.  I think the biggest operational risk would occur following a power-up of the chassis.  If one node were significantly delayed during bootup, e.g. because of networking issues, the other node would issue the STONITH and reboot the delayed node.  That would be an annoyance, but it would be relatively infrequent.  Our customers almost always keep at least one node (and usually both nodes) operational 24/7.

Regards,
  Michael

-----Original Message-----
From: Ken Gaillot <kgaillot at redhat.com> 
Sent: Tuesday, July 09, 2019 12:42 PM
To: Cluster Labs - All topics related to open-source clustering welcomed <users at clusterlabs.org>
Cc: Michael Powell <Michael.Powell at harmonicinc.com>; Venkata Reddy Chappavarapu <Venkata.Chappavarapu at harmonicinc.com>
Subject: [EXTERNAL] Re: [ClusterLabs] "node is unclean" leads to gratuitous reboot

On Tue, 2019-07-09 at 12:54 +0000, Michael Powell wrote:
> I have a two-node cluster with a problem.  If I start

Not so much a problem as a configuration choice :)

There are trade-offs in any case.

- wait_for_all in corosync.conf: If set, this will make each starting node wait until it sees the other before gaining quorum for the first time. The downside is that both nodes must be up for the cluster to start; the upside is a clean starting point and no fencing.

- startup-fencing in pacemaker properties: If disabled, either node can start without fencing the other. This is unsafe; if the other node is actually active and running resources, but unreachable from the newly up node, the newly up node may start the same resources, causing split- brain. (Easier than you might think: consider taking a node down for hardware maintenance, bringing it back up without a network, then plugging it back into the network -- by that point it may have brought up resources and starts causing havoc.)

- Start corosync on both nodes, then start pacemaker. This avoids start-up fencing since when pacemaker starts on either node, it already sees the other node present, even if that node's pacemaker isn't up yet.

Personally I'd go for wait_for_all in normal operation. You can always disable it if there are special circumstances where a node is expected to be out of the cluster for a long time.

> Corosync/Pacemaker on one node, and then delay startup on the 2nd node 
> (which is otherwise up and running), the 2nd node will be rebooted 
> very soon after STONITH is enabled on the first node.  This reboot 
> seems to be gratuitous and could under some circumstances be 
> problematic.  While, at present,  I “manually” start 
> Corosync/Pacemaker by invoking a script from an ssh session, in a 
> production environment, this script would be started by a systemd 
> service.  It’s not hard to imagine that if both nodes were started at 
> approximately the same time (each node runs on a separate motherboard 
> in the same chassis), this behavior could cause one of the nodes to be 
> rebooted while it’s in the process of booting up.
>  
> The two nodes’ host names are mgraid-16201289RN00023-0 and mgraid- 
> 16201289RN00023-1.  Both hosts are running, but Pacemaker has been 
> started on neither.  If Pacemaker is started on mgraid- 
> 16201289RN00023-0, within a few seconds after STONITH is enabled, the 
> following messages will appear in the system log file, and soon 
> thereafter STONITH will be invoked to reboot the other node, on which 
> Pacemaker has not yet been started.  (NB: The fence agent is a process 
> named mgpstonith which uses the ipmi interface to reboot the other 
> node.  For debugging, it prints the data it receives from stdin. )
>  
> 2019-07-08T13:11:14.907668-07:00 mgraid-16201289RN00023-0
> HA_STARTSTOP: Configure mgraid-stonith    # This message indicates
> that STONITH is about to be configured and enabled …
> 2019-07-08T13:11:15.018131-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16299]: info:... action=metadata#012
> 2019-07-08T13:11:15.050817-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16301]: info:... action=metadata#012 …
> 2019-07-08T13:11:21.085092-07:00 mgraid-16201289RN00023-0
> pengine[16216]:  warning: Scheduling Node mgraid-16201289RN00023-1 for 
> STONITH
> 2019-07-08T13:11:21.085615-07:00 mgraid-16201289RN00023-0
> pengine[16216]:   notice:  * Fence (reboot) mgraid-16201289RN00023-1
> 'node is unclean'
> 2019-07-08T13:11:21.085663-07:00 mgraid-16201289RN00023-0
> pengine[16216]:   notice:  * Promote    SS16201289RN00023:0     (
> Stopped -> Master mgraid-16201289RN00023-0 )
> 2019-07-08T13:11:21.085704-07:00 mgraid-16201289RN00023-0
> pengine[16216]:   notice:  * Start      mgraid-stonith:0       
> (                   mgraid-16201289RN00023-0 ) 
> 2019-07-08T13:11:21.091673-07:00 mgraid-16201289RN00023-0
> pengine[16216]:  warning: Calculated transition 0 (with warnings), 
> saving inputs in /var/lib/pacemaker/pengine/pe-warn-3.bz2
> 2019-07-08T13:11:21.093155-07:00 mgraid-16201289RN00023-0
> crmd[16218]:   notice: Initiating monitor operation
> SS16201289RN00023:0_monitor_0 locally on mgraid-16201289RN00023-0
> 2019-07-08T13:11:21.124403-07:00 mgraid-16201289RN00023-0
> crmd[16218]:   notice: Initiating monitor operation mgraid-
> stonith:0_monitor_0 locally on mgraid-16201289RN00023-0 …
> 2019-07-08T13:11:21.132994-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16361]: info:... action=metadata#012 …
> 2019-07-08T13:11:22.128139-07:00 mgraid-16201289RN00023-0
> crmd[16218]:   notice: Requesting fencing (reboot) of node mgraid-
> 16201289RN00023-1
> 2019-07-08T13:11:22.129150-07:00 mgraid-16201289RN00023-0
> crmd[16218]:   notice: Result of probe operation for
> SS16201289RN00023 on mgraid-16201289RN00023-0: 7 (not running)
> 2019-07-08T13:11:22.129191-07:00 mgraid-16201289RN00023-0
> crmd[16218]:   notice: mgraid-16201289RN00023-0-
> SS16201289RN00023_monitor_0:6 [ \n\n ]
> 2019-07-08T13:11:22.133846-07:00 mgraid-16201289RN00023-0 stonith-
> ng[16213]:   notice: Client crmd.16218.a7e3cbae wants to fence
> (reboot) 'mgraid-16201289RN00023-1' with device '(any)'
> 2019-07-08T13:11:22.133997-07:00 mgraid-16201289RN00023-0 stonith-
> ng[16213]:   notice: Requesting peer fencing (reboot) of mgraid-
> 16201289RN00023-1
> 2019-07-08T13:11:22.136287-07:00 mgraid-16201289RN00023-0
> crmd[16218]:   notice: Result of probe operation for mgraid-stonith
> on mgraid-16201289RN00023-0: 7 (not running)
> 2019-07-08T13:11:22.141393-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16444]: info:... action=status#012   # Status requests
> always return 0.
> 2019-07-08T13:11:22.141418-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16444]: info:... nodename=mgraid-16201289RN00023-1#012
> 2019-07-08T13:11:22.141432-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16444]: info:... port=mgraid-16201289RN00023-1#012
> 2019-07-08T13:11:22.141444-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16444]: info:Ignoring: port …
> 2019-07-08T13:11:22.148973-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16445]: info:... action=status#012
> 2019-07-08T13:11:22.148997-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16445]: info:... nodename=mgraid-16201289RN00023-1#012
> 2019-07-08T13:11:22.149009-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16445]: info:... port=mgraid-16201289RN00023-1#012
> 2019-07-08T13:11:22.149019-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16445]: info:Ignoring: port …
> 2019-07-08T13:11:22.155226-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16446]: info:... action=reboot#012
> 2019-07-08T13:11:22.155250-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16446]: info:... nodename=mgraid-16201289RN00023-1#012
> 2019-07-08T13:11:22.155263-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16446]: info:... port=mgraid-16201289RN00023-1#012
> 2019-07-08T13:11:22.155273-07:00 mgraid-16201289RN00023-0
> MGPSTONITH[16446]: info:Ignoring: port Following is a relevant excerpt 
> of the corosync.log file –
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: unpack_config:  STONITH timeout: 60000
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: unpack_config:  STONITH of failed nodes is enabled
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: unpack_config:  Concurrent fencing is disabled
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: unpack_config:  Stop all active resources: false
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: unpack_config:  Cluster is symmetric - resources can run 
> anywhere by default
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: unpack_config:  Default stickiness: 0
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: unpack_config:  On loss of CCM Quorum: Stop ALL resources
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: unpack_config:  Node scores: 'red' = -INFINITY, 'yellow' = 0, 
> 'green' = 0
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: determine_online_status_fencing: Node mgraid-16201289RN00023-0 
> is active
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: determine_online_status:       Node mgraid-16201289RN00023-0 is
> online
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: unpack_node_loop:     Node 1 is already processed
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: unpack_node_loop:     Node 1 is already processed
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: clone_print:    Master/Slave Set: ms-SS16201289RN00023
> [SS16201289RN00023]
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: short_print:         Stopped: [ mgraid-16201289RN00023-0
> mgraid-16201289RN00023-1 ]
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: clone_print:    Clone Set: mgraid-stonith-clone [mgraid-
> stonith]
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: short_print:         Stopped: [ mgraid-16201289RN00023-0
> mgraid-16201289RN00023-1 ]
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: native_rsc_location:       Constraint (ms-SS16201289RN00023-
> master-w1-rule) is not active (role : Master vs. Unknown)
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: native_rsc_location:       Constraint (ms-SS16201289RN00023-
> master-w1-rule) is not active (role : Master vs. Unknown)
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: native_rsc_location:       Constraint (ms-SS16201289RN00023-
> master-w1-rule) is not active (role : Master vs. Unknown)
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: distribute_children:       Allocating up to 2 ms-
> SS16201289RN00023 instances to a possible 1 nodes (at most 1 per host, 
> 2 optimal)
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: native_assign_node:       Assigning mgraid-16201289RN00023-0
> to SS16201289RN00023:0
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: native_assign_node:   All nodes for resource
> SS16201289RN00023:1 are unavailable, unclean or shutting down
> (mgraid-16201289RN00023-1: 0, -1000000)
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: native_assign_node:   Could not allocate a node for
> SS16201289RN00023:1
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: native_color:   Resource SS16201289RN00023:1 cannot run
> anywhere
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: distribute_children:       Allocated 1 ms-SS16201289RN00023
> instances of a possible 2
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: master_color:       SS16201289RN00023:0 master score: 99
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: master_color:   Promoting SS16201289RN00023:0 (Stopped mgraid-
> 16201289RN00023-0)
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: master_color:       SS16201289RN00023:1 master score: 0
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: master_color:   ms-SS16201289RN00023: Promoted 1 instances of a
> possible 1 to master
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: distribute_children:       Allocating up to 2 mgraid-stonith-
> clone instances to a possible 1 nodes (at most 1 per host, 2 optimal)
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: native_assign_node:       Assigning mgraid-16201289RN00023-0
> to mgraid-stonith:0
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: native_assign_node:   All nodes for resource mgraid-stonith:1
> are unavailable, unclean or shutting down (mgraid-16201289RN00023-1:
> 0, -1000000)
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: native_assign_node:   Could not allocate a node for mgraid-
> stonith:1
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: native_color:   Resource mgraid-stonith:1 cannot run anywhere
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: distribute_children:       Allocated 1 mgraid-stonith-clone
> instances of a possible 2
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: native_create_probe:       Probing SS16201289RN00023:0 on
> mgraid-16201289RN00023-0 (Stopped) 1 (nil)
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: native_create_probe:       Probing mgraid-stonith:0 on mgraid-
> 16201289RN00023-0 (Stopped) 1 (nil)
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: master_create_actions:       Creating actions for ms-
> SS16201289RN00023
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: RecurringOp:    Start recurring monitor (3s) for
> SS16201289RN00023:0 on mgraid-16201289RN00023-0
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: RecurringOp:    Start recurring monitor (3s) for
> SS16201289RN00023:0 on mgraid-16201289RN00023-0
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine: 
> warning: stage6:  Scheduling Node mgraid-16201289RN00023-1 for STONITH
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: native_start_constraints:       Ordering mgraid-
> stonith:0_start_0 after mgraid-16201289RN00023-1 recovery
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:  
> notice: LogNodeActions: * Fence (reboot) mgraid-16201289RN00023-1 
> 'node is unclean'
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:  
> notice: LogAction:      * Promote    SS16201289RN00023:0     (
> Stopped -> Master mgraid-16201289RN00023-0 ) 
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: LogActions:     Leave   SS16201289RN00023:1 (Stopped)
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:  
> notice: LogAction:      * Start      mgraid-stonith:0       
> (                   mgraid-16201289RN00023-0 ) 
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    
> info: LogActions:     Leave   mgraid-stonith:1    (Stopped)
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: action2xml:     Using anonymous clone name SS16201289RN00023
> for SS16201289RN00023:0 (aka. (null))
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: action2xml:     Using anonymous clone name SS16201289RN00023
> for SS16201289RN00023:0 (aka. (null))
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: action2xml:     Using anonymous clone name SS16201289RN00023
> for SS16201289RN00023:0 (aka. (null))
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: action2xml:     Using anonymous clone name SS16201289RN00023
> for SS16201289RN00023:0 (aka. (null))
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: action2xml:     Using anonymous clone name SS16201289RN00023
> for SS16201289RN00023:0 (aka. (null))
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: action2xml:     Using anonymous clone name SS16201289RN00023
> for SS16201289RN00023:0 (aka. (null))
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: action2xml:     Using anonymous clone name SS16201289RN00023
> for SS16201289RN00023:0 (aka. (null))
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: action2xml:     Using anonymous clone name mgraid-stonith for
> mgraid-stonith:0 (aka. (null))
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   
> debug: action2xml:     Using anonymous clone name mgraid-stonith for
> mgraid-stonith:0 (aka. (null))
> Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine: 
> warning: process_pe_message:       Calculated transition 0 (with
> warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-3.bz2
>  
> Here is the status of the first node, once Pacemaker is started –
>  
> [root at mgraid-16201289RN00023-0 bin]# pcs status Cluster name:
> Stack: corosync
> Current DC: mgraid-16201289RN00023-0 (version 1.1.19-8.el7-
> c3c624ea3d) - partition with quorum
> Last updated: Mon Jul  8 17:51:22 2019 Last change: Mon Jul  8 
> 16:11:23 2019 by root via cibadmin on mgraid-
> 16201289RN00023-0
>  
> 2 nodes configured
> 4 resources configured
>  
> Online: [ mgraid-16201289RN00023-0 ]
> OFFLINE: [ mgraid-16201289RN00023-1 ]
>  
> Full list of resources:
>  
> Master/Slave Set: ms-SS16201289RN00023 [SS16201289RN00023]
>      SS16201289RN00023  (ocf::omneon:ss):       Starting mgraid-
> 16201289RN00023-0
>      Stopped: [ mgraid-16201289RN00023-1 ] Clone Set: 
> mgraid-stonith-clone [mgraid-stonith]
>      Started: [ mgraid-16201289RN00023-0 ]
>      Stopped: [ mgraid-16201289RN00023-1 ]
>  
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: inactive/disabled
> Here’s the configuration, from the first node –
>  
> [root at mgraid-16201289RN00023-0 bin]# pcs status Cluster name:
> Stack: corosync
> Current DC: mgraid-16201289RN00023-0 (version 1.1.19-8.el7-
> c3c624ea3d) - partition with quorum
> Last updated: Mon Jul  8 17:51:22 2019 Last change: Mon Jul  8 
> 16:11:23 2019 by root via cibadmin on mgraid-
> 16201289RN00023-0
>  
> 2 nodes configured
> 4 resources configured
>  
> Online: [ mgraid-16201289RN00023-0 ]
> OFFLINE: [ mgraid-16201289RN00023-1 ]
>  
> Full list of resources:
>  
> Master/Slave Set: ms-SS16201289RN00023 [SS16201289RN00023]
>      SS16201289RN00023  (ocf::omneon:ss):       Starting mgraid-
> 16201289RN00023-0
>      Stopped: [ mgraid-16201289RN00023-1 ] Clone Set: 
> mgraid-stonith-clone [mgraid-stonith]
>      Started: [ mgraid-16201289RN00023-0 ]
>      Stopped: [ mgraid-16201289RN00023-1 ]
>  
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: inactive/disabled
> [root at mgraid-16201289RN00023-0 bin]# pcs config Cluster Name:
> Corosync Nodes:
> mgraid-16201289RN00023-0 mgraid-16201289RN00023-1 Pacemaker Nodes:
> mgraid-16201289RN00023-0 mgraid-16201289RN00023-1
>  
> Resources:
> Master: ms-SS16201289RN00023
>   Meta Attrs: clone-max=2 notify=true globally-unique=false target- 
> role=Started
>   Resource: SS16201289RN00023 (class=ocf provider=omneon type=ss)
>    Attributes: ss_resource=SS16201289RN00023
> ssconf=/var/omneon/config/config.16201289RN00023
>    Operations: monitor interval=3s role=Master timeout=7s
> (SS16201289RN00023-monitor-3s)
>                monitor interval=10s role=Slave timeout=7
> (SS16201289RN00023-monitor-10s)
>                stop interval=0 timeout=20 (SS16201289RN00023-stop-0)
>                start interval=0 timeout=300 (SS16201289RN00023-start-
> 0)
> Clone: mgraid-stonith-clone
>   Resource: mgraid-stonith (class=stonith type=mgpstonith)
>    Operations: monitor interval=0 timeout=20s (mgraid-stonith-
> monitor-interval-0)
>  
> Stonith Devices:
> Fencing Levels:
>  
> Location Constraints:
>   Resource: ms-SS16201289RN00023
>     Constraint: ms-SS16201289RN00023-master-w1
>       Rule: role=master score=100  (id:ms-SS16201289RN00023-master-
> w1-rule)
>         Expression: #uname eq mgraid-16201289rn00023-0  (id:ms-
> SS16201289RN00023-master-w1-rule-expression)
> Ordering Constraints:
> Colocation Constraints:
> Ticket Constraints:
>  
> Alerts:
> No alerts defined
>  
> Resources Defaults:
> failure-timeout: 1min
> Operations Defaults:
> No defaults set
>  
> Cluster Properties:
> cluster-infrastructure: corosync
> cluster-recheck-interval: 1min
> dc-deadtime: 5s
> dc-version: 1.1.19-8.el7-c3c624ea3d
> have-watchdog: false
> last-lrm-refresh: 1562513532
> stonith-enabled: true
>  
> Quorum:
>   Options:
>     wait_for_all: 0
> Interestingly, as you’ll note below, the “two_node” option is also set 
> to 1, but is not reported as such above.
>  
> Finally, here’s /etc/corosync/corosync.conf – totem {
>         version: 2
>  
>         crypto_cipher: none
>         crypto_hash: none
>  
>         interface {
>                 ringnumber: 0
>         bindnetaddr: 169.254.1.1
>                 mcastaddr: 239.255.1.1
>                 mcastport: 5405
>                 ttl: 1
>         }
> }
>  
> logging {
>         fileline: off
>         to_stderr: no
>         to_logfile: yes
>         logfile: /var/log/cluster/corosync.log
>         to_syslog: yes
>         debug: on
>         timestamp: on
>         logger_subsys {
>                 subsys: QUORUM
>                 debug: on
>         }
> }
>  
> nodelist {
>         node {
>                 ring0_addr: mgraid-16201289RN00023-0
>                 nodeid: 1
>         }
>  
>         node {
>                 ring0_addr: mgraid-16201289RN00023-1
>                 nodeid: 2
>         }
> }
>  
> quorum {
>         provider: corosync_votequorum
>  
>         two_node: 1
>  
>         wait_for_all: 0
> }
>  
> I’d appreciate any insight you can offer into this behavior, and any 
> suggestions you may have.
> 
> Regards,
>   Michael
> 
>  
>     Michael Powell
>     Sr. Staff Engineer
>  
>     15220 NW Greenbrier Pkwy
>         Suite 290
>     Beaverton, OR   97006
>     T 503-372-7327    M 503-789-3019   H 503-625-5332
>  
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgaillot at redhat.com>