[ClusterLabs] [EXTERNAL] Re: "node is unclean" leads to gratuitous reboot

Thu Jul 11 05:58:03 EDT 2019

On Wed, Jul 10, 2019 at 06:15:56PM +0000, Michael Powell wrote:
> Thanks to you and Andrei for your responses.  In our particular
> situation, we want to be able to operate with either node in
> stand-alone mode, or with both nodes protected by HA.  I did not
> mention this, but I am working on upgrading our product
> from a version which used Pacemaker version 1.0.13 and Heartbeat
> to run under CentOS 7.6 (later 8.0).
> The older version did not exhibit this behavior, hence my concern.

Heartbeat by default has much less aggressive timeout settings,
and clearly distinguishes between "deadtime", and "initdead",
basically a "wait_for_all" with timeout: how long to wait for other
nodes during startup before declaring them dead and proceeding in
the startup sequence, ultimately fencing unseen nodes anyways.

Pacemaker itself has "dc-deadtime", documented as
"How long to wait for a response from other nodes during startup.",
but the 20s default of that in current Pacemaker is much likely
shorter than what you had as initdead in your "old" setup.

So maybe if you set dc-deadtime to two minutes or something,
that would give you the "expected" behavior?

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT