[ClusterLabs] Howto stonith in the case of any interface failure?

Kadlecsik József kadlecsik.jozsef at wigner.mta.hu
Wed Oct 9 14:10:11 EDT 2019


On Wed, 9 Oct 2019, Ken Gaillot wrote:

> > One of the nodes has got a failure ("watchdog: BUG: soft lockup - 
> > CPU#7 stuck for 23s"), which resulted that the node could process 
> > traffic on the backend interface but not on the fronted one. Thus the 
> > services became unavailable but the cluster thought the node is all 
> > right and did not stonith it.
> > 
> > How could we protect the cluster against such failures?
> 
> See the ocf:heartbeat:ethmonitor agent (to monitor the interface itself) 
> and/or the ocf:pacemaker:ping agent (to monitor reachability of some IP 
> such as a gateway)

This looks really promising, thank you! Does the cluster regard it as a 
failure when a ocf:heartbeat:ethmonitor agent clone on a node does not 
run? :-)

Best regards,
Jozsef
--
E-mail : kadlecsik.jozsef at wigner.mta.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: Wigner Research Centre for Physics
         H-1525 Budapest 114, POB. 49, Hungary


More information about the Users mailing list