[ClusterLabs] Q: fence_kdump and fence_kdump_send
Reid Wahl
nwahl at redhat.com
Fri Feb 25 06:22:31 EST 2022
On Thu, Feb 24, 2022 at 4:22 AM Ulrich Windl
<Ulrich.Windl at rz.uni-regensburg.de> wrote:
>
> Hi!
>
> After reading about fence_kdump and fence_kdump_send I wonder:
> Does anybody use that in production?
Quite a lot of people, in fact.
> Having the networking and bonding in initrd does not sound like a good idea to me.
> Wouldn't it be easier to integrate that functionality into sbd?
> I mean: Let sbd wait for a "kdump-ed" message that initrd could send when kdump is complete.
> Basically that would be the same mechanism, but using storage instead of networking.
>
> If I get it right, the original fence_kdump would also introduce an extra fencing delay, and I wonder what happens with a hardware watchdog while a kdump is in progress...
>
> The background of all this is that our nodes kernel-panic, and support says the kdumps are all incomplete.
> The events are most likely:
> node1: panics (kdump)
> other_node: seens node1 had failed and fences it (via sbd).
>
> However sbd fencing wont work while kdump is executing (IMHO)
>
> So what happens most likely is that the watchdog terminates the kdump.
> In that case all the mess with fence_kdump won't help, right?
You can configure extra_modules in your /etc/kdump.conf file to
include the watchdog module, and then restart kdump.service. For
example:
# grep ^extra_modules /etc/kdump.conf
extra_modules i6300esb
If you're not sure of the name of your watchdog module, wdctl can help
you find it. sbd needs to be stopped first, because it keeps the
watchdog device timer busy.
# pcs cluster stop --all
# wdctl | grep Identity
Identity: i6300ESB timer [version 0]
# lsmod | grep -i i6300ESB
i6300esb 13566 0
If you're also using fence_sbd (poison-pill fencing via block device),
then you should be able to protect yourself from that during a dump by
configuring fencing levels so that fence_kdump is level 1 and
fence_sbd is level 2.
>
> Regards,
> Ulrich
>
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
--
Regards,
Reid Wahl (He/Him), RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
More information about the Users
mailing list