[ClusterLabs] Problems with master/slave failovers
Ken Gaillot
kgaillot at redhat.com
Fri Jun 28 10:51:44 EDT 2019
On Fri, 2019-06-28 at 04:24 +0000, Harvey Shepherd wrote:
> Hi All,
>
> I'm running Pacemaker 2.0.2 on a two node cluster. It runs one
> master/slave resource (I'll refer to it as the king resource) and
> about 20 other resources which are a mixture of:
>
> - resources that only run on the king resource master node
> (colocation constraint with a score of INFINITY)
> - clone resources that run on both nodes
> - two other master/slave resources where the masters runs on the same
> node as the king resource master (colocation constraint with a score
> of INFINITY)
>
> I'll refer to the above set of resources as servant resources.
>
> All servant resources have a resource-stickiness of zero and the king
> resource has a resource-stickiness of 100. There is an ordering
> constraint that the king resource must start before all servant
> resources. The king resource is controlled by an OCF script that uses
> crm_master to set the preferred master for the king resource (current
> master has value 100, current slave is 5, unassigned role or resource
> failure is 1) - I've verified that these values are being set as
> expected upon promotion/demotion/failure etc, via the logs. That's
> pretty much all of the configuration - there is no configuration
> around node preferences and migration-threshold is zero for
> everything.
>
> What I'm trying to achieve is fairly simple:
>
> 1. If any servant resource fails on either node, it is simply
> restarted. These resources should never failover onto the other
> node because of colocation with the king resource, and they should
> not contribute in any way to deciding whether the king resource
> should failover (which is why they have a resource-stickiness of
> zero).
This is actually not possible currently, but I have a long-planned
project to allow more fine control over failover behavior. I really
really hope to get to it this year, but important bugs generally take
up most of my time.
The plan is to replace the current migration-threshold and on-fail with
three new meta-attributes: failure-ignore, failure-restart, and
failure-escalation. The cluster would ignore the first "failure-
ignore#' failures, then try to restart it "failure-restart#" times, and
if it still failed, proceed with failure-escalation (ignore, block,
stop, ban, fence, or standby).
So for the above scenario, you could set something like failure-
ignore=0 (which will be the default), failure-restart=INFINITY (or some
number), and failure-escalation=ignore or stop.
The difference from today is that you can only ignore or stop on the
first failure, you can't try restarting a few times then ignore or
stop.
> 2. If the slave instance of the king resource fails, it should simply
> be restarted and again no failover should occur.
That's the default behavior.
> 3. If the master instance of the king resource fails, then its slave
> instance should immediately be promoted, and the failed instance
> should be restarted. Failover of all servant resources should then
> occur due to the colocation dependency.
>
> It's number 3 above that I'm having trouble with. If I kill the
> master king resource instance it behaves as I expect - everything
> fails over and the king resource is restarted on the new slave. If I
> then kill the master instance of the king resource again however,
> instead of failing back over to its original node, it restarts and
> promotes back to master on the same node. This is not what I want.
It sounds like you want migration-threshold=1 on the king resource.
Keep in mind that once a resource is forced off a node, it can't return
until the failure is cleaned up, whether that's manually or via a
failure-timeout.
> The relevant output from crm_simulate for the two tests is shown
> below. Can anyone suggest what might be going wrong? Whilst I really
> like the concept of crm_simulate, I can't find a good description of
> how to interpret the output and I don't understand the difference
> between clone_color and native_color, or the difference between
> "promotion scores" and the various instances of "allocation scores",
> nor does it really tell me what is contributing to the scores. Where
> does the -INFINITY allocation score come from for example?
>
> Thanks,
> Harvey
>
>
> FIRST KING RESOURCE MASTER FAILURE (CORRECT BEHAVIOUR - MASTER NODE
> FAILOVER OCCURS)
>
> Clone Set: ms_king_resource [king_resource] (promotable)
> king_resource (ocf::aviat:king-resource-ocf): FAILED
> Master secondary
> clone_color: ms_king_resource allocation score on primary: 0
> clone_color: ms_king_resource allocation score on secondary: 0
> clone_color: king_resource:0 allocation score on primary: 0
> clone_color: king_resource:0 allocation score on secondary: 101
> clone_color: king_resource:1 allocation score on primary: 200
> clone_color: king_resource:1 allocation score on secondary: 0
> native_color: king_resource:1 allocation score on primary: 200
> native_color: king_resource:1 allocation score on secondary: 0
> native_color: king_resource:0 allocation score on primary: -INFINITY
> native_color: king_resource:0 allocation score on secondary: 101
> king_resource:1 promotion score on primary: 100
> king_resource:0 promotion score on secondary: 1
> * Recover king_resource:0 ( Master -> Slave secondary )
> * Promote king_resource:1 ( Slave -> Master primary )
> * Resource action: king_resource cancel=10000 on secondary
> * Resource action: king_resource cancel=11000 on primary
> * Pseudo action: ms_king_resource_pre_notify_demote_0
> * Resource action: king_resource notify on secondary
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-pre_notify_demote_0
> * Pseudo action: ms_king_resource_demote_0
> * Resource action: king_resource demote on secondary
> * Pseudo action: ms_king_resource_demoted_0
> * Pseudo action: ms_king_resource_post_notify_demoted_0
> * Resource action: king_resource notify on secondary
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-post_notify_demoted_0
> * Pseudo action: ms_king_resource_pre_notify_stop_0
> * Resource action: king_resource notify on secondary
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-pre_notify_stop_0
> * Pseudo action: ms_king_resource_stop_0
> * Resource action: king_resource stop on secondary
> * Pseudo action: ms_king_resource_stopped_0
> * Pseudo action: ms_king_resource_post_notify_stopped_0
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-post_notify_stopped_0
> * Pseudo action: ms_king_resource_pre_notify_start_0
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-pre_notify_start_0
> * Pseudo action: ms_king_resource_start_0
> * Resource action: king_resource start on secondary
> * Pseudo action: ms_king_resource_running_0
> * Pseudo action: ms_king_resource_post_notify_running_0
> * Resource action: king_resource notify on secondary
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-post_notify_running_0
> * Pseudo action: ms_king_resource_pre_notify_promote_0
> * Resource action: king_resource notify on secondary
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-pre_notify_promote_0
> * Pseudo action: ms_king_resource_promote_0
> * Resource action: king_resource promote on primary
> * Pseudo action: ms_king_resource_promoted_0
> * Pseudo action: ms_king_resource_post_notify_promoted_0
> * Resource action: king_resource notify on secondary
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-post_notify_promoted_0
> * Resource action: king_resource monitor=11000 on secondary
> * Resource action: king_resource monitor=10000 on primary
> Clone Set: ms_king_resource [king_resource] (promotable)
>
>
> SECOND KING RESOURCE MASTER FAILURE (INCORRECT BEHAVIOUR - SAME NODE
> IS PROMOTED TO MASTER)
>
> Clone Set: ms_king_resource [king_resource] (promotable)
> king_resource (ocf::aviat:king-resource-ocf): FAILED
> Master primary
> clone_color: ms_king_resource allocation score on primary: 0
> clone_color: ms_king_resource allocation score on secondary: 0
> clone_color: king_resource:0 allocation score on primary: 0
> clone_color: king_resource:0 allocation score on secondary: 200
> clone_color: king_resource:1 allocation score on primary: 101
> clone_color: king_resource:1 allocation score on secondary: 0
> native_color: king_resource:0 allocation score on primary: 0
> native_color: king_resource:0 allocation score on secondary: 200
> native_color: king_resource:1 allocation score on primary: 101
> native_color: king_resource:1 allocation score on secondary:
> -INFINITY
> king_resource:1 promotion score on primary: 1
> king_resource:0 promotion score on secondary: 1
> * Recover king_resource:1 ( Master primary )
> * Pseudo action: ms_king_resource_pre_notify_demote_0
> * Resource action: king_resource notify on secondary
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-pre_notify_demote_0
> * Pseudo action: ms_king_resource_demote_0
> * Resource action: king_resource demote on primary
> * Pseudo action: ms_king_resource_demoted_0
> * Pseudo action: ms_king_resource_post_notify_demoted_0
> * Resource action: king_resource notify on secondary
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-post_notify_demoted_0
> * Pseudo action: ms_king_resource_pre_notify_stop_0
> * Resource action: king_resource notify on secondary
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-pre_notify_stop_0
> * Pseudo action: ms_king_resource_stop_0
> * Resource action: king_resource stop on primary
> * Pseudo action: ms_king_resource_stopped_0
> * Pseudo action: ms_king_resource_post_notify_stopped_0
> * Resource action: king_resource notify on secondary
> * Pseudo action: ms_king_resource_confirmed-post_notify_stopped_0
> * Pseudo action: ms_king_resource_pre_notify_start_0
> * Resource action: king_resource notify on secondary
> * Pseudo action: ms_king_resource_confirmed-pre_notify_start_0
> * Pseudo action: ms_king_resource_start_0
> * Resource action: king_resource start on primary
> * Pseudo action: ms_king_resource_running_0
> * Pseudo action: ms_king_resource_post_notify_running_0
> * Resource action: king_resource notify on secondary
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-post_notify_running_0
> * Pseudo action: ms_king_resource_pre_notify_promote_0
> * Resource action: king_resource notify on secondary
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-pre_notify_promote_0
> * Pseudo action: ms_king_resource_promote_0
> * Resource action: king_resource promote on primary
> * Pseudo action: ms_king_resource_promoted_0
> * Pseudo action: ms_king_resource_post_notify_promoted_0
> * Resource action: king_resource notify on secondary
> * Resource action: king_resource notify on primary
> * Pseudo action: ms_king_resource_confirmed-post_notify_promoted_0
> * Resource action: king_resource monitor=10000 on primary
> Clone Set: ms_king_resource [king_resource] (promotable)
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list