[ClusterLabs] Howto stonith in the case of any interface failure?
Ken Gaillot
kgaillot at redhat.com
Wed Oct 9 10:35:09 EDT 2019
On Wed, 2019-10-09 at 09:58 +0200, Kadlecsik József wrote:
> Hello,
>
> The nodes in our cluster have got backend and frontend interfaces:
> the
> former ones are for the storage and cluster (corosync) traffic and
> the
> latter ones are for the public services of KVM guests only.
>
> One of the nodes has got a failure ("watchdog: BUG: soft lockup -
> CPU#7
> stuck for 23s"), which resulted that the node could process traffic
> on the
> backend interface but not on the fronted one. Thus the services
> became
> unavailable but the cluster thought the node is all right and did
> not
> stonith it.
>
> How could we protect the cluster against such failures?
See the ocf:heartbeat:ethmonitor agent (to monitor the interface
itself) and/or the ocf:pacemaker:ping agent (to monitor reachability of
some IP such as a gateway)
>
> We could configure a second corosync ring, but that would be a
> redundancy
> ring only.
>
> We could setup a second, independent corosync configuration for a
> second
> pacemaker just with stonith agents. Is it enough to specify the
> cluster
> name in the corosync config to pair pacemaker to corosync? What about
> the
> pairing of pacemaker to this corosync instance, how can we tell
> pacemaker
> to connect to this corosync instance?
>
> Which is the best way to solve the problem?
>
> Best regards,
> Jozsef
> --
> E-mail : kadlecsik.jozsef at wigner.mta.hu
> PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
> Address: Wigner Research Centre for Physics
> H-1525 Budapest 114, POB. 49, Hungary
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list