[ClusterLabs] SLES11 SP4:SBD fencing problem with Xen (NMI not handled)?

Edwin Török edvin.torok at citrix.com
Mon Jul 30 05:20:54 EDT 2018


On 30/07/18 08:24, Ulrich Windl wrote:
> Hi!
> 
> We have a strange problem on one cluster node running Xen PV VMs (SLES11 SP4): After updating the kernel and adding new SBD devices (to replace an old storage system), the system just seems to freeze.

Hi,

Which version of Xen are you using and what Linux distribution is run in
Dom0?

> Closter inspection showed that SBD seems to send an NMI (for reasons still to be examined), and the current Xen/Kernel seems to be unable to handle the NMI in a way that forces a restart of the server (see attached screen shot).

Can you show us your kernel boot cmdline, and loaded modules?
Which watchdog module did you load? Have you tried xen_wdt?
See https://www.suse.com/support/kb/doc/?id=7016880

Best regards,
--Edwin

> 
> The last message I see in the node's cluster log is this:
> Jul 27 11:33:32 [15731] h01        cib:     info: cib_file_write_with_digest:      Reading cluster configuration file /var/lib/pacemaker/cib/cib.YESngs (digest: /var/lib/pacemaker/cib/cib.Yutv8O)
> 
> Other nodes have these messages:
> Jul 27 11:33:32 h05 dlm_controld.pcmk[15810]: dlm_process_node: Skipped active node 739512330: born-on=3864, last-seen=3936, this-event=3936, last-event=3932
> 
> Jul 27 11:33:32 h10 dlm_controld.pcmk[20397]: dlm_process_node: Skipped active node 739512325: born-on=3856, last-seen=3936, this-event=3936, last-event=3932
> 
> Can anybody bring some light into this issue?:
> 1) Under what circumstances is an NMI sent by SBD?
> 2) What is the reaction expected after receiving an NMI?
> 3) If it did work before, what could have gone wrong?
> 
> I wanted to get some feedback from here before asking SLES support...
> 
> Regards,
> Ulrich
> 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 



More information about the Users mailing list