[ClusterLabs] Problems with master/slave failovers

Fri Jun 28 10:51:44 EDT 2019

On Fri, 2019-06-28 at 04:24 +0000, Harvey Shepherd wrote:
> Hi All,
> 
> I'm running Pacemaker 2.0.2 on a two node cluster. It runs one
> master/slave resource (I'll refer to it as the king resource) and
> about 20 other resources which are a mixture of:
> 
> - resources that only run on the king resource master node
> (colocation constraint with a score of INFINITY)
> - clone resources that run on both nodes
> - two other master/slave resources where the masters runs on the same
> node as the king resource master (colocation constraint with a score
> of INFINITY)
> 
> I'll refer to the above set of resources as servant resources.
> 
> All servant resources have a resource-stickiness of zero and the king
> resource has a resource-stickiness of 100. There is an ordering
> constraint that the king resource must start before all servant
> resources. The king resource is controlled by an OCF script that uses
> crm_master to set the preferred master for the king resource (current
> master has value 100, current slave is 5, unassigned role or resource
> failure is 1) - I've verified that these values are being set as
> expected upon promotion/demotion/failure etc, via the logs. That's
> pretty much all of the configuration - there is no configuration
> around node preferences and migration-threshold is zero for
> everything.
> 
> What I'm trying to achieve is fairly simple:
> 
> 1. If any servant resource fails on either node, it is simply
> restarted. These resources should never failover onto the other
> node because of colocation with the king resource, and they should
> not contribute in any way to deciding whether the king resource
> should failover (which is why they have a resource-stickiness of
> zero).

This is actually not possible currently, but I have a long-planned
project to allow more fine control over failover behavior. I really
really hope to get to it this year, but important bugs generally take
up most of my time.

The plan is to replace the current migration-threshold and on-fail with
three new meta-attributes: failure-ignore, failure-restart, and
failure-escalation. The cluster would ignore the first "failure-
ignore#' failures, then try to restart it "failure-restart#" times, and
if it still failed, proceed with failure-escalation (ignore, block,
stop, ban, fence, or standby).

So for the above scenario, you could set something like failure-
ignore=0 (which will be the default), failure-restart=INFINITY (or some
number), and failure-escalation=ignore or stop.

The difference from today is that you can only ignore or stop on the
first failure, you can't try restarting a few times then ignore or
stop.

> 2. If the slave instance of the king resource fails, it should simply
> be restarted and again no failover should occur.

That's the default behavior.

> 3. If the master instance of the king resource fails, then its slave
> instance should immediately be promoted, and the failed instance
> should be restarted. Failover of all servant resources should then
> occur due to the colocation dependency.
> 
> It's number 3 above that I'm having trouble with. If I kill the
> master king resource instance it behaves as I expect - everything
> fails over and the king resource is restarted on the new slave. If I
> then kill the master instance of the king resource again however,
> instead of failing back over to its original node, it restarts and
> promotes back to master on the same node. This is not what I want.

It sounds like you want migration-threshold=1 on the king resource.

Keep in mind that once a resource is forced off a node, it can't return
until the failure is cleaned up, whether that's manually or via a
failure-timeout.

> The relevant output from crm_simulate for the two tests is shown
> below. Can anyone suggest what might be going wrong? Whilst I really
> like the concept of crm_simulate, I can't find a good description of
> how to interpret the output and I don't understand the difference
> between clone_color and native_color, or the difference between
> "promotion scores" and the various instances of "allocation scores",
> nor does it really tell me what is contributing to the scores. Where
> does the -INFINITY allocation score come from for example?
> 
> Thanks,
> Harvey
> 
> 
> FIRST KING RESOURCE MASTER FAILURE (CORRECT BEHAVIOUR - MASTER NODE
> FAILOVER OCCURS)
> 
>  Clone Set: ms_king_resource [king_resource] (promotable)
>      king_resource      (ocf::aviat:king-resource-ocf):    FAILED
> Master secondary
> clone_color: ms_king_resource allocation score on primary: 0
> clone_color: ms_king_resource allocation score on secondary: 0
> clone_color: king_resource:0 allocation score on primary: 0
> clone_color: king_resource:0 allocation score on secondary: 101
> clone_color: king_resource:1 allocation score on primary: 200
> clone_color: king_resource:1 allocation score on secondary: 0
> native_color: king_resource:1 allocation score on primary: 200
> native_color: king_resource:1 allocation score on secondary: 0
> native_color: king_resource:0 allocation score on primary: -INFINITY
> native_color: king_resource:0 allocation score on secondary: 101
> king_resource:1 promotion score on primary: 100
> king_resource:0 promotion score on secondary: 1
>  * Recover    king_resource:0      ( Master -> Slave secondary )  
>  * Promote    king_resource:1      (   Slave -> Master primary )  
>  * Resource action: king_resource   cancel=10000 on secondary
>  * Resource action: king_resource   cancel=11000 on primary
>  * Pseudo action:   ms_king_resource_pre_notify_demote_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_demote_0
>  * Pseudo action:   ms_king_resource_demote_0
>  * Resource action: king_resource   demote on secondary
>  * Pseudo action:   ms_king_resource_demoted_0
>  * Pseudo action:   ms_king_resource_post_notify_demoted_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-post_notify_demoted_0
>  * Pseudo action:   ms_king_resource_pre_notify_stop_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_stop_0
>  * Pseudo action:   ms_king_resource_stop_0
>  * Resource action: king_resource   stop on secondary
>  * Pseudo action:   ms_king_resource_stopped_0
>  * Pseudo action:   ms_king_resource_post_notify_stopped_0
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-post_notify_stopped_0
>  * Pseudo action:   ms_king_resource_pre_notify_start_0
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_start_0
>  * Pseudo action:   ms_king_resource_start_0
>  * Resource action: king_resource   start on secondary
>  * Pseudo action:   ms_king_resource_running_0
>  * Pseudo action:   ms_king_resource_post_notify_running_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-post_notify_running_0
>  * Pseudo action:   ms_king_resource_pre_notify_promote_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_promote_0
>  * Pseudo action:   ms_king_resource_promote_0
>  * Resource action: king_resource   promote on primary
>  * Pseudo action:   ms_king_resource_promoted_0
>  * Pseudo action:   ms_king_resource_post_notify_promoted_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-post_notify_promoted_0
>  * Resource action: king_resource   monitor=11000 on secondary
>  * Resource action: king_resource   monitor=10000 on primary
>  Clone Set: ms_king_resource [king_resource] (promotable)
> 
> 
> SECOND KING RESOURCE MASTER FAILURE (INCORRECT BEHAVIOUR - SAME NODE
> IS PROMOTED TO MASTER)
> 
>  Clone Set: ms_king_resource [king_resource] (promotable)
>      king_resource      (ocf::aviat:king-resource-ocf):    FAILED
> Master primary
> clone_color: ms_king_resource allocation score on primary: 0
> clone_color: ms_king_resource allocation score on secondary: 0
> clone_color: king_resource:0 allocation score on primary: 0
> clone_color: king_resource:0 allocation score on secondary: 200
> clone_color: king_resource:1 allocation score on primary: 101
> clone_color: king_resource:1 allocation score on secondary: 0
> native_color: king_resource:0 allocation score on primary: 0
> native_color: king_resource:0 allocation score on secondary: 200
> native_color: king_resource:1 allocation score on primary: 101
> native_color: king_resource:1 allocation score on secondary:
> -INFINITY
> king_resource:1 promotion score on primary: 1
> king_resource:0 promotion score on secondary: 1
>  * Recover    king_resource:1     ( Master primary )  
>  * Pseudo action:   ms_king_resource_pre_notify_demote_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_demote_0
>  * Pseudo action:   ms_king_resource_demote_0
>  * Resource action: king_resource   demote on primary
>  * Pseudo action:   ms_king_resource_demoted_0
>  * Pseudo action:   ms_king_resource_post_notify_demoted_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-post_notify_demoted_0
>  * Pseudo action:   ms_king_resource_pre_notify_stop_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_stop_0
>  * Pseudo action:   ms_king_resource_stop_0
>  * Resource action: king_resource   stop on primary
>  * Pseudo action:   ms_king_resource_stopped_0
>  * Pseudo action:   ms_king_resource_post_notify_stopped_0
>  * Resource action: king_resource   notify on secondary
>  * Pseudo action:   ms_king_resource_confirmed-post_notify_stopped_0
>  * Pseudo action:   ms_king_resource_pre_notify_start_0
>  * Resource action: king_resource   notify on secondary
>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_start_0
>  * Pseudo action:   ms_king_resource_start_0
>  * Resource action: king_resource   start on primary
>  * Pseudo action:   ms_king_resource_running_0
>  * Pseudo action:   ms_king_resource_post_notify_running_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-post_notify_running_0
>  * Pseudo action:   ms_king_resource_pre_notify_promote_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_promote_0
>  * Pseudo action:   ms_king_resource_promote_0
>  * Resource action: king_resource   promote on primary
>  * Pseudo action:   ms_king_resource_promoted_0
>  * Pseudo action:   ms_king_resource_post_notify_promoted_0
>  * Resource action: king_resource   notify on secondary
>  * Resource action: king_resource   notify on primary
>  * Pseudo action:   ms_king_resource_confirmed-post_notify_promoted_0
>  * Resource action: king_resource   monitor=10000 on primary
>  Clone Set: ms_king_resource [king_resource] (promotable)
> 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>