[ClusterLabs] Q: fence_kdump and fence_kdump_send

Fri Feb 25 06:47:08 EST 2022

On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl <nwahl at redhat.com> wrote:
>
> On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl <nwahl at redhat.com> wrote:
> >
...
> > >
> > > So what happens most likely is that the watchdog terminates the kdump.
> > > In that case all the mess with fence_kdump won't help, right?
> >
> > You can configure extra_modules in your /etc/kdump.conf file to
> > include the watchdog module, and then restart kdump.service. For
> > example:
> >
> > # grep ^extra_modules /etc/kdump.conf
> > extra_modules i6300esb
> >
> > If you're not sure of the name of your watchdog module, wdctl can help
> > you find it. sbd needs to be stopped first, because it keeps the
> > watchdog device timer busy.
> >
> > # pcs cluster stop --all
> > # wdctl | grep Identity
> > Identity:      i6300ESB timer [version 0]
> > # lsmod | grep -i i6300ESB
> > i6300esb               13566  0
> >
> >
> > If you're also using fence_sbd (poison-pill fencing via block device),
> > then you should be able to protect yourself from that during a dump by
> > configuring fencing levels so that fence_kdump is level 1 and
> > fence_sbd is level 2.
>
> RHKB, for anyone interested:
>   - sbd watchdog timeout causes node to reboot during crash kernel
> execution (https://access.redhat.com/solutions/3552201)

What is not clear from this KB (and quotes from it above) - what
instance updates watchdog? Quoting (emphasis mine)

--><--
With the module loaded, the timer *CAN* be updated so that it does not
expire and force a reboot in the middle of vmcore generation.
--><--

Sure it can, but what program exactly updates the watchdog during
kdump execution? I am pretty sure that sbd does not run at this point.