[ClusterLabs] "node is unclean" leads to gratuitous reboot

Tue Jul 9 08:54:31 EDT 2019

I have a two-node cluster with a problem.  If I start Corosync/Pacemaker on one node, and then delay startup on the 2nd node (which is otherwise up and running), the 2nd node will be rebooted very soon after STONITH is enabled on the first node.  This reboot seems to be gratuitous and could under some circumstances be problematic.  While, at present,  I "manually" start Corosync/Pacemaker by invoking a script from an ssh session, in a production environment, this script would be started by a systemd service.  It's not hard to imagine that if both nodes were started at approximately the same time (each node runs on a separate motherboard in the same chassis), this behavior could cause one of the nodes to be rebooted while it's in the process of booting up.

The two nodes' host names are mgraid-16201289RN00023-0 and mgraid-16201289RN00023-1.  Both hosts are running, but Pacemaker has been started on neither.  If Pacemaker is started on mgraid-16201289RN00023-0, within a few seconds after STONITH is enabled, the following messages will appear in the system log file, and soon thereafter STONITH will be invoked to reboot the other node, on which Pacemaker has not yet been started.  (NB: The fence agent is a process named mgpstonith which uses the ipmi interface to reboot the other node.  For debugging, it prints the data it receives from stdin. )

2019-07-08T13:11:14.907668-07:00 mgraid-16201289RN00023-0 HA_STARTSTOP: Configure mgraid-stonith    # This message indicates that STONITH is about to be configured and enabled
...
2019-07-08T13:11:15.018131-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16299]: info:... action=metadata#012
2019-07-08T13:11:15.050817-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16301]: info:... action=metadata#012
...
2019-07-08T13:11:21.085092-07:00 mgraid-16201289RN00023-0 pengine[16216]:  warning: Scheduling Node mgraid-16201289RN00023-1 for STONITH
2019-07-08T13:11:21.085615-07:00 mgraid-16201289RN00023-0 pengine[16216]:   notice:  * Fence (reboot) mgraid-16201289RN00023-1 'node is unclean'
2019-07-08T13:11:21.085663-07:00 mgraid-16201289RN00023-0 pengine[16216]:   notice:  * Promote    SS16201289RN00023:0     ( Stopped -> Master mgraid-16201289RN00023-0 )
2019-07-08T13:11:21.085704-07:00 mgraid-16201289RN00023-0 pengine[16216]:   notice:  * Start      mgraid-stonith:0        (                   mgraid-16201289RN00023-0 )
2019-07-08T13:11:21.091673-07:00 mgraid-16201289RN00023-0 pengine[16216]:  warning: Calculated transition 0 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-3.bz2
2019-07-08T13:11:21.093155-07:00 mgraid-16201289RN00023-0 crmd[16218]:   notice: Initiating monitor operation SS16201289RN00023:0_monitor_0 locally on mgraid-16201289RN00023-0
2019-07-08T13:11:21.124403-07:00 mgraid-16201289RN00023-0 crmd[16218]:   notice: Initiating monitor operation mgraid-stonith:0_monitor_0 locally on mgraid-16201289RN00023-0
...
2019-07-08T13:11:21.132994-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16361]: info:... action=metadata#012
...
2019-07-08T13:11:22.128139-07:00 mgraid-16201289RN00023-0 crmd[16218]:   notice: Requesting fencing (reboot) of node mgraid-16201289RN00023-1
2019-07-08T13:11:22.129150-07:00 mgraid-16201289RN00023-0 crmd[16218]:   notice: Result of probe operation for SS16201289RN00023 on mgraid-16201289RN00023-0: 7 (not running)
2019-07-08T13:11:22.129191-07:00 mgraid-16201289RN00023-0 crmd[16218]:   notice: mgraid-16201289RN00023-0-SS16201289RN00023_monitor_0:6 [ \n\n ]
2019-07-08T13:11:22.133846-07:00 mgraid-16201289RN00023-0 stonith-ng[16213]:   notice: Client crmd.16218.a7e3cbae wants to fence (reboot) 'mgraid-16201289RN00023-1' with device '(any)'
2019-07-08T13:11:22.133997-07:00 mgraid-16201289RN00023-0 stonith-ng[16213]:   notice: Requesting peer fencing (reboot) of mgraid-16201289RN00023-1
2019-07-08T13:11:22.136287-07:00 mgraid-16201289RN00023-0 crmd[16218]:   notice: Result of probe operation for mgraid-stonith on mgraid-16201289RN00023-0: 7 (not running)
2019-07-08T13:11:22.141393-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16444]: info:... action=status#012   # Status requests always return 0.
2019-07-08T13:11:22.141418-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16444]: info:... nodename=mgraid-16201289RN00023-1#012
2019-07-08T13:11:22.141432-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16444]: info:... port=mgraid-16201289RN00023-1#012
2019-07-08T13:11:22.141444-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16444]: info:Ignoring: port
...
2019-07-08T13:11:22.148973-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16445]: info:... action=status#012
2019-07-08T13:11:22.148997-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16445]: info:... nodename=mgraid-16201289RN00023-1#012
2019-07-08T13:11:22.149009-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16445]: info:... port=mgraid-16201289RN00023-1#012
2019-07-08T13:11:22.149019-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16445]: info:Ignoring: port
...
2019-07-08T13:11:22.155226-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16446]: info:... action=reboot#012
2019-07-08T13:11:22.155250-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16446]: info:... nodename=mgraid-16201289RN00023-1#012
2019-07-08T13:11:22.155263-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16446]: info:... port=mgraid-16201289RN00023-1#012
2019-07-08T13:11:22.155273-07:00 mgraid-16201289RN00023-0 MGPSTONITH[16446]: info:Ignoring: port
________________________________
Following is a relevant excerpt of the corosync.log file -
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: unpack_config:  STONITH timeout: 60000
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: unpack_config:  STONITH of failed nodes is enabled
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: unpack_config:  Concurrent fencing is disabled
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: unpack_config:  Stop all active resources: false
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: unpack_config:  Cluster is symmetric - resources can run anywhere by default
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: unpack_config:  Default stickiness: 0
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: unpack_config:  On loss of CCM Quorum: Stop ALL resources
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: unpack_config:  Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: determine_online_status_fencing: Node mgraid-16201289RN00023-0 is active
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: determine_online_status:       Node mgraid-16201289RN00023-0 is online
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: unpack_node_loop:     Node 1 is already processed
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: unpack_node_loop:     Node 1 is already processed
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: clone_print:    Master/Slave Set: ms-SS16201289RN00023 [SS16201289RN00023]
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: short_print:         Stopped: [ mgraid-16201289RN00023-0 mgraid-16201289RN00023-1 ]
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: clone_print:    Clone Set: mgraid-stonith-clone [mgraid-stonith]
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: short_print:         Stopped: [ mgraid-16201289RN00023-0 mgraid-16201289RN00023-1 ]
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: native_rsc_location:       Constraint (ms-SS16201289RN00023-master-w1-rule) is not active (role : Master vs. Unknown)
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: native_rsc_location:       Constraint (ms-SS16201289RN00023-master-w1-rule) is not active (role : Master vs. Unknown)
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: native_rsc_location:       Constraint (ms-SS16201289RN00023-master-w1-rule) is not active (role : Master vs. Unknown)
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: distribute_children:       Allocating up to 2 ms-SS16201289RN00023 instances to a possible 1 nodes (at most 1 per host, 2 optimal)
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: native_assign_node:       Assigning mgraid-16201289RN00023-0 to SS16201289RN00023:0
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: native_assign_node:   All nodes for resource SS16201289RN00023:1 are unavailable, unclean or shutting down (mgraid-16201289RN00023-1: 0, -1000000)
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: native_assign_node:   Could not allocate a node for SS16201289RN00023:1
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: native_color:   Resource SS16201289RN00023:1 cannot run anywhere
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: distribute_children:       Allocated 1 ms-SS16201289RN00023 instances of a possible 2
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: master_color:       SS16201289RN00023:0 master score: 99
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: master_color:   Promoting SS16201289RN00023:0 (Stopped mgraid-16201289RN00023-0)
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: master_color:       SS16201289RN00023:1 master score: 0
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: master_color:   ms-SS16201289RN00023: Promoted 1 instances of a possible 1 to master
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: distribute_children:       Allocating up to 2 mgraid-stonith-clone instances to a possible 1 nodes (at most 1 per host, 2 optimal)
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: native_assign_node:       Assigning mgraid-16201289RN00023-0 to mgraid-stonith:0
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: native_assign_node:   All nodes for resource mgraid-stonith:1 are unavailable, unclean or shutting down (mgraid-16201289RN00023-1: 0, -1000000)
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: native_assign_node:   Could not allocate a node for mgraid-stonith:1
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: native_color:   Resource mgraid-stonith:1 cannot run anywhere
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: distribute_children:       Allocated 1 mgraid-stonith-clone instances of a possible 2
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: native_create_probe:       Probing SS16201289RN00023:0 on mgraid-16201289RN00023-0 (Stopped) 1 (nil)
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: native_create_probe:       Probing mgraid-stonith:0 on mgraid-16201289RN00023-0 (Stopped) 1 (nil)
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: master_create_actions:       Creating actions for ms-SS16201289RN00023
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: RecurringOp:    Start recurring monitor (3s) for SS16201289RN00023:0 on mgraid-16201289RN00023-0
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: RecurringOp:    Start recurring monitor (3s) for SS16201289RN00023:0 on mgraid-16201289RN00023-0
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:  warning: stage6:  Scheduling Node mgraid-16201289RN00023-1 for STONITH
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: native_start_constraints:       Ordering mgraid-stonith:0_start_0 after mgraid-16201289RN00023-1 recovery
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   notice: LogNodeActions: * Fence (reboot) mgraid-16201289RN00023-1 'node is unclean'
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   notice: LogAction:      * Promote    SS16201289RN00023:0     ( Stopped -> Master mgraid-16201289RN00023-0 )
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: LogActions:     Leave   SS16201289RN00023:1 (Stopped)
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:   notice: LogAction:      * Start      mgraid-stonith:0        (                   mgraid-16201289RN00023-0 )
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:     info: LogActions:     Leave   mgraid-stonith:1    (Stopped)
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: action2xml:     Using anonymous clone name SS16201289RN00023 for SS16201289RN00023:0 (aka. (null))
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: action2xml:     Using anonymous clone name SS16201289RN00023 for SS16201289RN00023:0 (aka. (null))
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: action2xml:     Using anonymous clone name SS16201289RN00023 for SS16201289RN00023:0 (aka. (null))
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: action2xml:     Using anonymous clone name SS16201289RN00023 for SS16201289RN00023:0 (aka. (null))
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: action2xml:     Using anonymous clone name SS16201289RN00023 for SS16201289RN00023:0 (aka. (null))
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: action2xml:     Using anonymous clone name SS16201289RN00023 for SS16201289RN00023:0 (aka. (null))
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: action2xml:     Using anonymous clone name SS16201289RN00023 for SS16201289RN00023:0 (aka. (null))
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: action2xml:     Using anonymous clone name mgraid-stonith for mgraid-stonith:0 (aka. (null))
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:    debug: action2xml:     Using anonymous clone name mgraid-stonith for mgraid-stonith:0 (aka. (null))
Jul 08 13:11:21 [16216] mgraid-16201289RN00023-0    pengine:  warning: process_pe_message:       Calculated transition 0 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-3.bz2

________________________________
Here is the status of the first node, once Pacemaker is started -

[root at mgraid-16201289RN00023-0 bin]# pcs status
Cluster name:
Stack: corosync
Current DC: mgraid-16201289RN00023-0 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Mon Jul  8 17:51:22 2019
Last change: Mon Jul  8 16:11:23 2019 by root via cibadmin on mgraid-16201289RN00023-0

2 nodes configured
4 resources configured

Online: [ mgraid-16201289RN00023-0 ]
OFFLINE: [ mgraid-16201289RN00023-1 ]

Full list of resources:

Master/Slave Set: ms-SS16201289RN00023 [SS16201289RN00023]
     SS16201289RN00023  (ocf::omneon:ss):       Starting mgraid-16201289RN00023-0
     Stopped: [ mgraid-16201289RN00023-1 ]
Clone Set: mgraid-stonith-clone [mgraid-stonith]
     Started: [ mgraid-16201289RN00023-0 ]
     Stopped: [ mgraid-16201289RN00023-1 ]

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: inactive/disabled
________________________________
Here's the configuration, from the first node -

[root at mgraid-16201289RN00023-0 bin]# pcs status
Cluster name:
Stack: corosync
Current DC: mgraid-16201289RN00023-0 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Mon Jul  8 17:51:22 2019
Last change: Mon Jul  8 16:11:23 2019 by root via cibadmin on mgraid-16201289RN00023-0

2 nodes configured
4 resources configured

Online: [ mgraid-16201289RN00023-0 ]
OFFLINE: [ mgraid-16201289RN00023-1 ]

Full list of resources:

Master/Slave Set: ms-SS16201289RN00023 [SS16201289RN00023]
     SS16201289RN00023  (ocf::omneon:ss):       Starting mgraid-16201289RN00023-0
     Stopped: [ mgraid-16201289RN00023-1 ]
Clone Set: mgraid-stonith-clone [mgraid-stonith]
     Started: [ mgraid-16201289RN00023-0 ]
     Stopped: [ mgraid-16201289RN00023-1 ]

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: inactive/disabled
[root at mgraid-16201289RN00023-0 bin]# pcs config
Cluster Name:
Corosync Nodes:
mgraid-16201289RN00023-0 mgraid-16201289RN00023-1
Pacemaker Nodes:
mgraid-16201289RN00023-0 mgraid-16201289RN00023-1

Resources:
Master: ms-SS16201289RN00023
  Meta Attrs: clone-max=2 notify=true globally-unique=false target-role=Started
  Resource: SS16201289RN00023 (class=ocf provider=omneon type=ss)
   Attributes: ss_resource=SS16201289RN00023 ssconf=/var/omneon/config/config.16201289RN00023
   Operations: monitor interval=3s role=Master timeout=7s (SS16201289RN00023-monitor-3s)
               monitor interval=10s role=Slave timeout=7 (SS16201289RN00023-monitor-10s)
               stop interval=0 timeout=20 (SS16201289RN00023-stop-0)
               start interval=0 timeout=300 (SS16201289RN00023-start-0)
Clone: mgraid-stonith-clone
  Resource: mgraid-stonith (class=stonith type=mgpstonith)
   Operations: monitor interval=0 timeout=20s (mgraid-stonith-monitor-interval-0)

Stonith Devices:
Fencing Levels:

Location Constraints:
  Resource: ms-SS16201289RN00023
    Constraint: ms-SS16201289RN00023-master-w1
      Rule: role=master score=100  (id:ms-SS16201289RN00023-master-w1-rule)
        Expression: #uname eq mgraid-16201289rn00023-0  (id:ms-SS16201289RN00023-master-w1-rule-expression)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
No alerts defined

Resources Defaults:
failure-timeout: 1min
Operations Defaults:
No defaults set

Cluster Properties:
cluster-infrastructure: corosync
cluster-recheck-interval: 1min
dc-deadtime: 5s
dc-version: 1.1.19-8.el7-c3c624ea3d
have-watchdog: false
last-lrm-refresh: 1562513532
stonith-enabled: true

Quorum:
  Options:
    wait_for_all: 0
________________________________
Interestingly, as you'll note below, the "two_node" option is also set to 1, but is not reported as such above.

Finally, here's /etc/corosync/corosync.conf -
totem {
        version: 2

        crypto_cipher: none
        crypto_hash: none

        interface {
                ringnumber: 0
        bindnetaddr: 169.254.1.1
                mcastaddr: 239.255.1.1
                mcastport: 5405
                ttl: 1
        }
}

logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        logfile: /var/log/cluster/corosync.log
        to_syslog: yes
        debug: on
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: on
        }
}

nodelist {
        node {
                ring0_addr: mgraid-16201289RN00023-0
                nodeid: 1
        }

        node {
                ring0_addr: mgraid-16201289RN00023-1
                nodeid: 2
        }
}

quorum {
        provider: corosync_votequorum

        two_node: 1

        wait_for_all: 0
}

I'd appreciate any insight you can offer into this behavior, and any suggestions you may have.

Regards,
  Michael
[cid:image004.gif at 01D15F05.08FA4670]

    Michael Powell
    Sr. Staff Engineer

    15220 NW Greenbrier Pkwy
        Suite 290
    Beaverton, OR   97006
    T 503-372-7327    M 503-789-3019   H 503-625-5332

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190709/2027ca59/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 1854 bytes
Desc: image001.gif
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190709/2027ca59/attachment-0001.gif>