[ClusterLabs] Howto stonith in the case of any interface failure?

Wed Oct 9 03:58:24 EDT 2019

Hello,

The nodes in our cluster have got backend and frontend interfaces: the 
former ones are for the storage and cluster (corosync) traffic and the 
latter ones are for the public services of KVM guests only.

One of the nodes has got a failure ("watchdog: BUG: soft lockup - CPU#7 
stuck for 23s"), which resulted that the node could process traffic on the 
backend interface but not on the fronted one. Thus the services became 
unavailable but the cluster thought the node is all right and did not 
stonith it. 

How could we protect the cluster against such failures?

We could configure a second corosync ring, but that would be a redundancy 
ring only.

We could setup a second, independent corosync configuration for a second 
pacemaker just with stonith agents. Is it enough to specify the cluster 
name in the corosync config to pair pacemaker to corosync? What about the 
pairing of pacemaker to this corosync instance, how can we tell pacemaker 
to connect to this corosync instance?

Which is the best way to solve the problem? 

Best regards,
Jozsef
--
E-mail : kadlecsik.jozsef at wigner.mta.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: Wigner Research Centre for Physics
         H-1525 Budapest 114, POB. 49, Hungary