[ClusterLabs] Failing operations immediately when node is known to be down

Ryan Thomas developmentrjt at gmail.com
Tue Apr 10 13:56:06 EDT 2018


I’m trying to implement a HA solution which recovers very quickly when a
node fails.  It my configuration, when I reboot a node, I see in the logs
that pacemaker realizes the node is down, and decides to move all resources
to the surviving node.  To do this, it initiates a ‘stop’ operation on each
of the resources to perform the move.  The ‘stop’ fails as expected after
20s (the default action timeout).  However, in this case, with the node
known to be down,  I’d like to avoid this 20 second delay.  The node is
known to be down, so any operations sent to the node will fail.  It would
be nice if operations sent to a down node would immediately fail, thus
reducing the time it takes the resource to be started on the surviving
node.  I do not want to reduce the timeout for the operation, because the
timeout is sensible for when a resource moves due to a non-node-failure.  Is
there a way to accomplish this?


Thanks for your help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180410/ca5caf76/attachment.html>


More information about the Users mailing list