[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Coming in Pacemaker 2.1.2: new fencing configuration options

Tue Oct 12 14:26:58 EDT 2021

On Tue, 2021-10-12 at 20:48 +0300, Andrei Borzenkov wrote:
> On 12.10.2021 09:27, Ulrich Windl wrote:
> > > > > Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 11.10.2021
> > > > > um 11:43 in
> > Nachricht
> > <CAA91j0Ur9wxxzOpVL7MHmnFMp60EbhxhVDCk=zf2YQGjV-SWtg at mail.gmail.com
> > >:
> > > On Mon, Oct 11, 2021 at 9:29 AM Ulrich Windl
> > > <Ulrich.Windl at rz.uni‑regensburg.de> wrote:
> > > ....
> > > > > > Also how long would such a delay be: Long enough until the
> > > > > > other node
> > > > > > is
> > > > > > fenced, or long enough until the other node was fenced,
> > > > > > booted
> > > > > > (assuming it
> > > > > > does) and is running pacemaker?
> > > > > 
> > > > > The delay should be on the less‑preferred node, long enough
> > > > > for that
> > > > > node to get fenced. The other node, with no delay, will fence
> > > > > it if it
> > > > > can. If the other node is for whatever reason unable to
> > > > > fence, the node
> > > > > with the delay will fence it after the delay.
> > > > 
> > > > So the "fence intention" will be lost when the node is being
> > > > fenced?
> > > > Otherwise the surviving node would have to clean up the "fence
> > > > intention".
> > > > Or does it mean the "fence intention" does not make it to the
> > > > CIB and
> > stays
> > > > local on the node?
> > > > 
> > > 
> > > Two nodes cannot communicate with each other so the surviving
> > > node is
> > > not aware of anything the fenced node did or intended to do. When
> > > the
> > 
> > I thought (local) CIB writes do not need a quorum.
> > 
> > > fenced node reboots and pacemaker starts it should pull CIB from
> > > the
> > > surviving node, so whatever intentions the fenced node had before
> > > reboot should be lost at this point.
> > 
> > If the surviving node has a CIB newer (as per
> > modification/configuration
> > count) the fenced node that is true, but it the fenced node has a
> > newer CIB,
> > the surviving node would pull the "other" CIB, right?
> 
> Indeed. I honestly did not expect it.
> 
> I am not sure what consequences it has in practice though. It is
> certainly one more argument against running without mandatory stonith
> because in this case both nodes happily continue and it is
> unpredictable
> which one will win after they rejoin.
> 
> Assuming we do run with mandatory stonith then we have relatively
> small
> window before DC is killed (because only DC can update CIB). But I am
> not sure whether CIB changes will be committed locally until all
> nodes
> are either confirmed to be offline or acknowledged CIB changes. I
> guess
> only Ken can answer it :)

In general each node maintains its own copy of the CIB (writing
locally), and only changes (diffs) are passed between nodes. Checksums
are used to make sure the content remains functionally the same on all
nodes.

However full CIB replacements can be done, whether by user request (pcs
generally uses this for config changes, btw) or when the CIB gets out
of sync on the nodes.

When a node joins an existing cluster (like a fenced node rejoining),
the CIB versions will be compared, and the newest one wins (actually
more like the one with the most changes).

Generally, the existing cluster had more activity after the node was
fenced, and the fenced node has little to no activity before it rejoins
the cluster, so it works out well. However I have seen scripts that
start the cluster on a node and immediately set some node attributes
or whatnot, causing the fenced node to look "newer" when it rejoins.

> 
> > I think I had a few cases in the past when the "last dying node"
> > did not have
> > the "latest" CIB, causing some "extra noise" when the cluster was
> > formed
> > again.
> 
> Details of what happened are certainly interesting.
> 
> > Probably some period to wait for all nodes to join (and thus sync
> > the CIBs)
> > before performing any actions would help there.

-- 
Ken Gaillot <kgaillot at redhat.com>