[ClusterLabs] Q: fence_kdump and fence_kdump_send

Reid Wahl nwahl at redhat.com
Fri Feb 25 06:23:11 EST 2022


On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl <nwahl at redhat.com> wrote:
>
> On Thu, Feb 24, 2022 at 4:22 AM Ulrich Windl
> <Ulrich.Windl at rz.uni-regensburg.de> wrote:
> >
> > Hi!
> >
> > After reading about fence_kdump and fence_kdump_send I wonder:
> > Does anybody use that in production?
>
> Quite a lot of people, in fact.
>
> > Having the networking and bonding in initrd does not sound like a good idea to me.
> > Wouldn't it be easier to integrate that functionality into sbd?
> > I mean: Let sbd wait for a "kdump-ed" message that initrd could send when kdump is complete.
> > Basically that would be the same mechanism, but using storage instead of networking.
> >
> > If I get it right, the original fence_kdump would also introduce an extra fencing delay, and I wonder what happens with a hardware watchdog while a kdump is in progress...
> >
> > The background of all this is that our nodes kernel-panic, and support says the kdumps are all incomplete.
> > The events are most likely:
> > node1: panics (kdump)
> > other_node: seens node1 had failed and fences it (via sbd).
> >
> > However sbd fencing wont work while kdump is executing (IMHO)
> >
> > So what happens most likely is that the watchdog terminates the kdump.
> > In that case all the mess with fence_kdump won't help, right?
>
> You can configure extra_modules in your /etc/kdump.conf file to
> include the watchdog module, and then restart kdump.service. For
> example:
>
> # grep ^extra_modules /etc/kdump.conf
> extra_modules i6300esb
>
> If you're not sure of the name of your watchdog module, wdctl can help
> you find it. sbd needs to be stopped first, because it keeps the
> watchdog device timer busy.
>
> # pcs cluster stop --all
> # wdctl | grep Identity
> Identity:      i6300ESB timer [version 0]
> # lsmod | grep -i i6300ESB
> i6300esb               13566  0
>
>
> If you're also using fence_sbd (poison-pill fencing via block device),
> then you should be able to protect yourself from that during a dump by
> configuring fencing levels so that fence_kdump is level 1 and
> fence_sbd is level 2.

RHKB, for anyone interested:
  - sbd watchdog timeout causes node to reboot during crash kernel
execution (https://access.redhat.com/solutions/3552201)
>
>
> >
> > Regards,
> > Ulrich
> >
> >
> >
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> >
>
>
> --
> Regards,
>
> Reid Wahl (He/Him), RHCA
> Senior Software Maintenance Engineer, Red Hat
> CEE - Platform Support Delivery - ClusterHA



-- 
Regards,

Reid Wahl (He/Him), RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA



More information about the Users mailing list