[ClusterLabs] Antw: [EXT] delaying start of a resource

Wed Dec 16 14:06:08 EST 2020

On Wed, 2020-12-16 at 15:56 +0100, Gabriele Bulfon wrote:
> Thanks, here are the logs, there are infos about how it tried to
> start resources on the nodes.
> Keep in mind the node1 was already running the resources, and I
> simulated a problem by turning down the ha interface.
>  
> Gabriele

>From the logs, Pacemaker is scheduling resource recovery after fencing
(which means stonith-enabled must already be true, by the way). I don't
know how you could see resources start without fencing succeeding
first.

Have you tested the fence devices themselves? E.g. manually run the
fence agent with the same parameters, or run "stonith_admin --reboot
<node>". It's possible the fence device is returning success without
actually doing the fencing, though I'm not sure how that would happen
either.

BTW if you're using corosync < 3, turning down the interface isn't a
good test. Physically pulling the cable, or using the firewall to block
both incoming and outgoing packets on the interface, is better.

>  
> Sonicle S.r.l. : http://www.sonicle.com
> Music: http://www.gabrielebulfon.com
> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
>  
> 
> 
> 
> -------------------------------------------------------------------
> ---------------
> 
> Da: Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>
> A: users at clusterlabs.org 
> Data: 16 dicembre 2020 15.45.36 CET
> Oggetto: [ClusterLabs] Antw: [EXT] delaying start of a resource
> 
> > >>> Gabriele Bulfon <gbulfon at sonicle.com> schrieb am 16.12.2020 um
> > 15:32 in
> > Nachricht <1523391015.734.1608129155836 at www>:
> > > Hi, I have now a two node cluster using stonith with different 
> > > pcmk_delay_base, so that node 1 has priority to stonith node 2 in
> > case of 
> > > problems.
> > > 
> > > Though, there is still one problem: once node 2 delays its
> > stonith action 
> > > for 10 seconds, and node 1 just 1, node 2 does not delay start of
> > resources, 
> > > so it happens that while it's not yet powered off by node 1 (and
> > waiting its 
> > > dalay to power off node 1) it actually starts resources, causing
> > a moment of 
> > > few seconds where both NFS IP and ZFS pool (!!!!!) is mounted by
> > both!
> > 
> > AFAIK pacemaker will not start resources on a node that is
> > scheduled for stonith. Even more: Pacemaker will tra to stop
> > resources on a node scheduled for stonith to start them elsewhere.
> > 
> > > How can I delay node 2 resource start until the delayed stonith
> > action is 
> > > done? Or how can I just delay the resource start so I can make it
> > larger than 
> > > its pcmk_delay_base?
> > 
> > We probably need to see logs and configs to understand.
> > 
> > > 
> > > Also, I was suggested to set "stonith-enabled=true", but I don't
> > know where 
> > > to set this flag (cib-bootstrap-options is not happy with it...).
> > 
> > I think it's on by default, so you must have set it to false.
> > In crm shell it is "configure# property stonith-enabled=...".
> > 
> > Regards,
> > Ulrich
-- 
Ken Gaillot <kgaillot at redhat.com>