[ClusterLabs] Q: fence_kdump and fence_kdump_send

Thu Feb 24 11:26:51 EST 2022

We use the fence_kump* code extensively in production and have never had any problems with it (other than the normal initial configuration challenges). Kernel panic + kdump is our most common failure mode, so we exercise this code quite a bit.
Thanks,
Chris

From: Users <users-bounces at clusterlabs.org>
Date: Thursday, February 24, 2022 at 7:22 AM
To: users at clusterlabs.org <users at clusterlabs.org>
Subject: [ClusterLabs] Q: fence_kdump and fence_kdump_send
Hi!

After reading about fence_kdump and fence_kdump_send I wonder:
Does anybody use that in production?
Having the networking and bonding in initrd does not sound like a good idea to me.
Wouldn't it be easier to integrate that functionality into sbd?
I mean: Let sbd wait for a "kdump-ed" message that initrd could send when kdump is complete.
Basically that would be the same mechanism, but using storage instead of networking.

If I get it right, the original fence_kdump would also introduce an extra fencing delay, and I wonder what happens with a hardware watchdog while a kdump is in progress...

The background of all this is that our nodes kernel-panic, and support says the kdumps are all incomplete.
The events are most likely:
node1: panics (kdump)
other_node: seens node1 had failed and fences it (via sbd).

However sbd fencing wont work while kdump is executing (IMHO)

So what happens most likely is that the watchdog terminates the kdump.
In that case all the mess with fence_kdump won't help, right?

Regards,
Ulrich

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users<https://lists.clusterlabs.org/mailman/listinfo/users>

ClusterLabs home: https://www.clusterlabs.org/<https://www.clusterlabs.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220224/df663879/attachment-0001.htm>