[ClusterLabs] Trying to Understanding crm-fence-peer.sh

Lars Ellenberg lars.ellenberg at linbit.com
Wed Jan 16 10:53:32 EST 2019


On Wed, Jan 16, 2019 at 04:27:18PM +0100, Valentin Vidic wrote:
> On Wed, Jan 16, 2019 at 04:20:03PM +0100, Valentin Vidic wrote:
> > I think drbd always calls crm-fence-peer.sh when it becomes disconnected
> > primary.  In this case storage1 has closed the DRBD connection and
> > storage2 has become a disconnected primary.
> > 
> > Maybe the problem is the order that the services are stopped during
> > reboot. It would seem that drbd is shutdown before pacemaker. You
> > can try to run manually:
> > 
> >   pacemaker stop
> >   corosync stop
> >   drbd stop
> > 
> > and see what happens in this case.
> 
> Some more info here:
> 
> https://www.suse.com/documentation/sle-ha-12/book_sleha/data/sec_ha_drbd_fencing.html
> 
> So storage2 does not know why the other end disappeared and tries to use
> pacemaker to prevent storage1 from ever becoming a primary.  Only when
> it comes back online and gets in sync it is allowed to start again as a
> pacemaker resource by a second script:
> 
>   after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";

Though that should be "unfence-peer" nowadays, and no longer overload
the after-resync-target handler, which actually has a different purpose.

To clarify: crm-fence-peer.sh is an *example implementation*
(even though an elaborate one) of a DRBD fencing policy handler,
which uses pacemaker location constraints on the Master role
if DRBD is not sure about the up-to-date-ness of that instance,
to ban nodes from taking over the Master role.

It does NOT trigger node level fencing.
But it has to wait for, and rely on, pacemaker node level fencing.

That script is heavily commented, btw,
so you should be able to follow
what it tries to do, and even why.

Other implementations of drbd fencing policy handlers may directly
escalate to node level fencing. If that is what you want, use one of
those, and effectively map every DRBD replication link hickup to a hard
reset of the peer.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT



More information about the Users mailing list