[ClusterLabs] Howto stonith in the case of any interface failure?
Kadlecsik József
kadlecsik.jozsef at wigner.mta.hu
Wed Oct 9 03:58:24 EDT 2019
Hello,
The nodes in our cluster have got backend and frontend interfaces: the
former ones are for the storage and cluster (corosync) traffic and the
latter ones are for the public services of KVM guests only.
One of the nodes has got a failure ("watchdog: BUG: soft lockup - CPU#7
stuck for 23s"), which resulted that the node could process traffic on the
backend interface but not on the fronted one. Thus the services became
unavailable but the cluster thought the node is all right and did not
stonith it.
How could we protect the cluster against such failures?
We could configure a second corosync ring, but that would be a redundancy
ring only.
We could setup a second, independent corosync configuration for a second
pacemaker just with stonith agents. Is it enough to specify the cluster
name in the corosync config to pair pacemaker to corosync? What about the
pairing of pacemaker to this corosync instance, how can we tell pacemaker
to connect to this corosync instance?
Which is the best way to solve the problem?
Best regards,
Jozsef
--
E-mail : kadlecsik.jozsef at wigner.mta.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: Wigner Research Centre for Physics
H-1525 Budapest 114, POB. 49, Hungary
More information about the Users
mailing list