[ClusterLabs] SLES11 SP4:SBD fencing problem with Xen (NMI not handled)?

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Jul 30 03:24:36 EDT 2018


We have a strange problem on one cluster node running Xen PV VMs (SLES11 SP4): After updating the kernel and adding new SBD devices (to replace an old storage system), the system just seems to freeze.
Closter inspection showed that SBD seems to send an NMI (for reasons still to be examined), and the current Xen/Kernel seems to be unable to handle the NMI in a way that forces a restart of the server (see attached screen shot).

The last message I see in the node's cluster log is this:
Jul 27 11:33:32 [15731] h01        cib:     info: cib_file_write_with_digest:      Reading cluster configuration file /var/lib/pacemaker/cib/cib.YESngs (digest: /var/lib/pacemaker/cib/cib.Yutv8O)

Other nodes have these messages:
Jul 27 11:33:32 h05 dlm_controld.pcmk[15810]: dlm_process_node: Skipped active node 739512330: born-on=3864, last-seen=3936, this-event=3936, last-event=3932

Jul 27 11:33:32 h10 dlm_controld.pcmk[20397]: dlm_process_node: Skipped active node 739512325: born-on=3856, last-seen=3936, this-event=3936, last-event=3932

Can anybody bring some light into this issue?:
1) Under what circumstances is an NMI sent by SBD?
2) What is the reaction expected after receiving an NMI?
3) If it did work before, what could have gone wrong?

I wanted to get some feedback from here before asking SLES support...


-------------- next part --------------
A non-text attachment was scrubbed...
Name: Xen-NMI.jpg
Type: image/jpeg
Size: 172095 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180730/6beef94c/attachment-0001.jpg>

More information about the Users mailing list