[ClusterLabs] Problem with the cluster becoming mostly unresponsive

Strahil Nikolov hunter86_bg at yahoo.com
Sat May 15 11:29:45 EDT 2021


>So a monitor failure on the fence agent rendered the cluster effectively
unresponsive? How would I normally recover from this?
Actually it will ban the resource (stonith) from the node when it reaches the maximum fail count. When the stonith is banned from all nodes, no node will be able to use that stonith.

You can use 'failure-timeout' meta attribute to reset the fail count. I'm using it for the ipmi fencing mechanisms.

Of course the best approach is to make that stonith more reliable but usually this is out of our control.
Another approach is to define a second stonith method and use stonith topology.
Best Regards,Strahil Nikolov


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210515/f59f895b/attachment.htm>


More information about the Users mailing list