[ClusterLabs] Antw: Re: Antw: [EXT] preventing fencing op on resource stop failure

Nikola Ciprich nikola.ciprich at linuxbox.cz
Wed Mar 23 04:47:25 EDT 2022


Hello Ulrich,

stop failure was very rare problem relevant only to those one particular
resource. I want to keep pacemaker restarting resources on their failures,
but I just don't want to fence the node when stop fails - it's better for
me to investigate immediately then to have unplanned node outage just because
of one resource failure..

that's why I'd like to set it this way..

with regards

nik


On Tue, Mar 22, 2022 at 11:34:56AM +0100, Ulrich Windl wrote:
> Hello,
> 
> in general you should find out _why_ the stop operation fails and then fix
> it.
> Obviously with "on-fail=block" your resource in no longer highly available.
> 
> If all your resources are failing on stop you have a major problem.
> You can put the cluster in maintenance mode, but then you do not have any HA.
> So what are you using pacemaker for?
> 
> Regards,
> Ulrich
> 
> >>> Nikola Ciprich <nikola.ciprich at linuxbox.cz> schrieb am 22.03.2022 um 10:55
> in
> Nachricht <YjmdKDmQfYnpB7VZ at pcnci.linuxbox.cz>:
> > Hello Ulrich,
> > 
> > thanks for the tip! Is there a way I can set global default for
> > all stop operations? Documentation seems to be very brief on this
> > particular topic :)
> > 
> > with regards
> > 
> > nik
> > 
> > 
> > On Mon, Mar 21, 2022 at 10:39:24AM +0100, Ulrich Windl wrote:
> >> >>> Nikola Ciprich <nikola.ciprich at linuxbox.cz> schrieb am 21.03.2022 um
> 10:31
> >> in
> >> Nachricht <YjhF4zZzyrBe3vQa at pcnci.linuxbox.cz>:
> >> > Hello dear fellow pacemaker users and developers,
> >> > 
> >> > we've recently experienced unplanned outages caused by failure
> >> > to stop rather unimportant resource.. This was caused by default
> >> > setting to fence the node on failing stop operation.
> >> > 
> >> > While we'll investigate further those stop failures, I'd still
> >> > like to change this setting. We're perfectly fine with resource to
> >> > remain hanging in failed state when this happens, we'll see this
> >> > in monitoring and act accordingly. However setting on‑fail action
> >> > to block also prevents restarting resources on their crash, which
> >> > is not what we want... Is there a way to disable fence for stop
> >> > failures, but to keep other behaviour as it is?
> >> 
> >> Just add an "on-fail=" for the stop operation (e.g. on_fail=block).
> >> 
> >> > 
> >> > maybe I'm missing something obvious?
> >> > 
> >> > thanks a lot in advance!
> >> > 
> >> > with best regards
> >> > 
> >> > nik
> >> > 
> >> > 
> >> > ‑‑ 
> >> > ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
> >> > Ing. Nikola CIPRICH
> >> > LinuxBox.cz, s.r.o.
> >> > 28.rijna 168, 709 00 Ostrava
> >> > 
> >> > tel.:   +420 591 166 214
> >> > fax:    +420 596 621 273
> >> > mobil:  +420 777 093 799
> >> > www.linuxbox.cz 
> >> > 
> >> > mobil servis: +420 737 238 656
> >> > email servis: servis at linuxbox.cz 
> >> > ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
> >> > _______________________________________________
> >> > Manage your subscription:
> >> > https://lists.clusterlabs.org/mailman/listinfo/users 
> >> > 
> >> > ClusterLabs home: https://www.clusterlabs.org/ 
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> Manage your subscription:
> >> https://lists.clusterlabs.org/mailman/listinfo/users 
> >> 
> >> ClusterLabs home: https://www.clusterlabs.org/ 
> >> 
> > 
> > -- 
> > -------------------------------------
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> > 
> > tel.:   +420 591 166 214
> > fax:    +420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz 
> > 
> > mobil servis: +420 737 238 656
> > email servis: servis at linuxbox.cz 
> > -------------------------------------
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users 
> > 
> > ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis at linuxbox.cz
-------------------------------------


More information about the Users mailing list