[ClusterLabs] Q: fence_kdump and fence_kdump_send
Reid Wahl
nwahl at redhat.com
Fri Feb 25 14:34:03 EST 2022
On Fri, Feb 25, 2022 at 3:47 AM Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>
> On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl <nwahl at redhat.com> wrote:
> >
> > On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl <nwahl at redhat.com> wrote:
> > >
> ...
> > > >
> > > > So what happens most likely is that the watchdog terminates the kdump.
> > > > In that case all the mess with fence_kdump won't help, right?
> > >
> > > You can configure extra_modules in your /etc/kdump.conf file to
> > > include the watchdog module, and then restart kdump.service. For
> > > example:
> > >
> > > # grep ^extra_modules /etc/kdump.conf
> > > extra_modules i6300esb
> > >
> > > If you're not sure of the name of your watchdog module, wdctl can help
> > > you find it. sbd needs to be stopped first, because it keeps the
> > > watchdog device timer busy.
> > >
> > > # pcs cluster stop --all
> > > # wdctl | grep Identity
> > > Identity: i6300ESB timer [version 0]
> > > # lsmod | grep -i i6300ESB
> > > i6300esb 13566 0
> > >
> > >
> > > If you're also using fence_sbd (poison-pill fencing via block device),
> > > then you should be able to protect yourself from that during a dump by
> > > configuring fencing levels so that fence_kdump is level 1 and
> > > fence_sbd is level 2.
> >
> > RHKB, for anyone interested:
> > - sbd watchdog timeout causes node to reboot during crash kernel
> > execution (https://access.redhat.com/solutions/3552201)
>
> What is not clear from this KB (and quotes from it above) - what
> instance updates watchdog? Quoting (emphasis mine)
>
> --><--
> With the module loaded, the timer *CAN* be updated so that it does not
> expire and force a reboot in the middle of vmcore generation.
> --><--
>
> Sure it can, but what program exactly updates the watchdog during
> kdump execution? I am pretty sure that sbd does not run at this point.
That's a valid question. I found this approach to work back in 2018
after a fair amount of frustration, and didn't question it too deeply
at the time.
The answer seems to be that the kernel does it.
- https://stackoverflow.com/a/2020717
- https://stackoverflow.com/a/42589110
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
--
Regards,
Reid Wahl (He/Him), RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
More information about the Users
mailing list