[ClusterLabs] Problems with master/slave failovers
Ken Gaillot
kgaillot at redhat.com
Mon Jul 1 10:43:30 EDT 2019
On Sun, 2019-06-30 at 11:13 +0000, Harvey Shepherd wrote:
> >> There is an ordering constraint - everything must be started after
> the king resource. But even if this constraint didn't exist I don't
> see that it should logically make any difference due to all the non-
> clone resources being colocated with the master of the king resource.
> Surely it would make no sense for Pacemaker to start or move
> colocated resources until a master king resource has been elected?
> >>
> >> <tags>
> >> <tag id="servant2_dependents">
> >> <obj_ref id="servant4"/>
> >> <obj_ref id="servant5"/>
> >> <obj_ref id="servant6"/>
> >> <obj_ref id="servant7"/>
> >> <obj_ref id="servant8"/>
> >> <obj_ref id="servant9_active_disabled"/>
> >> <obj_ref id="servant11"/>
> >> <obj_ref id="servant12"/>
> >> <obj_ref id="servant13"/>
> >> </tag>
> >> </tags>
> >> <constraints>
> >> <rsc_colocation id="colocation_with_king_resource_master"
> score="INFINITY">
> >> <resource_set id="king_resource_master_dependents"
> sequential="false">
> >> <resource_ref id="stk_shared_ip"/>
> >> <resource_ref id="servant4"/>
> >> <resource_ref id="servant5"/>
> >> <resource_ref id="servant6"/>
> >> <resource_ref id="servant7"/>
> >> <resource_ref id="servant8"/>
> >> <resource_ref id="servant9_active_disabled"/>
> >> <resource_ref id="servant10"/>
> >> <resource_ref id="servant11"/>
> >> <resource_ref id="servant12"/>
> >> <resource_ref id="servant13"/>
> >> </resource_set>
> >> <resource_set id="king_resource_master" sequential="true"
> role="Master">
> >> <resource_ref id="ms_king_resource"/>
> >> <resource_ref id="ms_servant2"/>
> >> <resource_ref id="ms_servant3"/>
> >> </resource_set>
> >> </rsc_colocation>
> >> <rsc_order id="dependents_after_servant2" kind="Mandatory"
> first="ms_servant2" then="servant2_dependents"/>
> >> </constraints>
> >>
>
> >This ordering constraint is satisfied by slave of ms_servant2. Slave
> is already started at the point failover happens so pacemaker is free
> to start all other resources immediately. If you intend to order
> against master, you need first-action="promote" then-action="start".
>
> As I mentioned in my last message I have trouble with using first-
> action="promote" because some of the dependents are clone resources.
> I just tried it again and the dependent clones only start on the
> master node with this setting. What I really need is a first-
> action="promote | demote" setting, but this isn't available. I tried
> adding two separate rules but Pacemaker doesn't like that and none of
> the dependents start.
Simply leaving off first-action allows the dependency against either
the master or slave role. It sounds you're mixing primitives and clones
in the same colocation, and want the primitives only with the master
role but the clones with any role -- that will require separate
constraints (the primitives colocated with the master role and ordered
after promote, the clones colocated without specifying the role and
ordered without specifying the action).
>
> From: Harvey Shepherd
> Sent: Sunday, 30 June 2019 5:34 p.m.
> To: Cluster Labs - All topics related to open-source clustering
> welcomed
> Subject: Re: EXTERNAL: Re: [ClusterLabs] Problems with master/slave
> failovers
>
> Thanks Andrei for the time you've taken to look at this issue for me.
> What's actually happening with the master preference scores is that
> initially when the master fails, the other node does have a higher
> preference to become master. Pacemaker tries and fails multiple times
> to perform the failover, resulting in the "transition aborted" logs
> that I posted previously. By the time all that has happened, the
> original master has restarted and therefore has the same master
> preference as the original slave, hence pacemaker just re-promotes
> the same master. Occasionally the failover is successful, but I think
> it's down to luck with timing.
>
> I think that the root cause is that pacemaker is trying to move the
> servant resources prior to promoting the king master. Your last
> suggestion about changing the ordering constraint to depend on
> promotion of the master/slave resource rather than when it starts
> make sense, and I'll try changing that and see if it makes a
> difference. I have tried that in the past however and had trouble
> with clone resources that are dependents only starting on the master
> node with that setting. I'll try again though and let you know how it
> goes.
>
> Thanks,
> Harvey
>
> On 30 Jun 2019 5:14 pm, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
> > 28.06.2019 9:45, Andrei Borzenkov пишет:
> > > On Fri, Jun 28, 2019 at 7:24 AM Harvey Shepherd
> > > <Harvey.Shepherd at aviatnet.com> wrote:
> > >>
> > >> Hi All,
> > >>
> > >>
> > >> I'm running Pacemaker 2.0.2 on a two node cluster. It runs one
> > master/slave resource (I'll refer to it as the king resource) and
> > about 20 other resources which are a mixture of:
> > >>
> > >>
> > >> - resources that only run on the king resource master node
> > (colocation constraint with a score of INFINITY)
> > >>
> > >> - clone resources that run on both nodes
> > >>
> > >> - two other master/slave resources where the masters runs on the
> > same node as the king resource master (colocation constraint with a
> > score of INFINITY)
> > >>
> > >>
> > >> I'll refer to the above set of resources as servant resources.
> > >>
> > >>
> > >> All servant resources have a resource-stickiness of zero and the
> > king resource has a resource-stickiness of 100. There is an
> > ordering constraint that the king resource must start before all
> > servant resources. The king resource is controlled by an OCF script
> > that uses crm_master to set the preferred master for the king
> > resource (current master has value 100, current slave is 5,
> > unassigned role or resource failure is 1) - I've verified that
> > these values are being set as expected upon
> > promotion/demotion/failure etc, via the logs. That's pretty much
> > all of the configuration - there is no configuration around node
> > preferences and migration-threshold is zero for everything.
> > >>
> > >>
> > >> What I'm trying to achieve is fairly simple:
> > >>
> > >>
> > >> 1. If any servant resource fails on either node, it is simply
> > restarted. These resources should never failover onto the other
> > node because of colocation with the king resource, and they should
> > not contribute in any way to deciding whether the king resource
> > should failover (which is why they have a resource-stickiness of
> > zero).
> > >>
> > >> 2. If the slave instance of the king resource fails, it should
> > simply be restarted and again no failover should occur.
> > >>
> > >> 3. If the master instance of the king resource fails, then its
> > slave instance should immediately be promoted, and the failed
> > instance should be restarted. Failover of all servant resources
> > should then occur due to the colocation dependency.
> > >>
> > >>
> > >> It's number 3 above that I'm having trouble with. If I kill the
> > master king resource instance it behaves as I expect - everything
> > fails over and the king resource is restarted on the new slave. If
> > I then kill the master instance of the king resource again however,
> > instead of failing back over to its original node, it restarts and
> > promotes back to master on the same node. This is not what I want.
> > >>
> > >
> > > migration-threshold is the first thing that comes in mind.
> > Another
> > > possibility is hard error returned by resource agent that forces
> > > resource off node.
> > >
> > > But please realize that without actual configuration and logs at
> > the
> > > time undesired behavior happens it just becomes game of riddles.
> > >
> > >>
> > >> The relevant output from crm_simulate for the two tests is shown
> > below. Can anyone suggest what might be going wrong? Whilst I
> > really like the concept of crm_simulate, I can't find a good
> > description of how to interpret the output and I don't understand
> > the difference between clone_color and native_color, or the
> > difference between "promotion scores" and the various instances of
> > "allocation scores", nor does it really tell me what is
> > contributing to the scores. Where does the -INFINITY allocation
> > score come from for example?
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Harvey
> > >>
> > >>
> > >>
> > >> FIRST KING RESOURCE MASTER FAILURE (CORRECT BEHAVIOUR - MASTER
> > NODE FAILOVER OCCURS)
> > >>
> > >>
> > >> Clone Set: ms_king_resource [king_resource] (promotable)
> > >> king_resource (ocf::aviat:king-resource-ocf):
> > FAILED Master secondary
> > >> clone_color: ms_king_resource allocation score on primary: 0
> > >> clone_color: ms_king_resource allocation score on secondary: 0
> > >> clone_color: king_resource:0 allocation score on primary: 0
> > >> clone_color: king_resource:0 allocation score on secondary: 101
> > >> clone_color: king_resource:1 allocation score on primary: 200
> > >> clone_color: king_resource:1 allocation score on secondary: 0
> > >> native_color: king_resource:1 allocation score on primary: 200
> > >> native_color: king_resource:1 allocation score on secondary: 0
> > >> native_color: king_resource:0 allocation score on primary:
> > -INFINITY
> > >> native_color: king_resource:0 allocation score on secondary: 101
> > >> king_resource:1 promotion score on primary: 100
> > >> king_resource:0 promotion score on secondary: 1
> > >> * Recover king_resource:0 ( Master -> Slave secondary )
> > >> * Promote king_resource:1 ( Slave -> Master primary )
> > >> * Resource action: king_resource cancel=10000 on secondary
> > >> * Resource action: king_resource cancel=11000 on primary
> > >> * Pseudo action: ms_king_resource_pre_notify_demote_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > pre_notify_demote_0
> > >> * Pseudo action: ms_king_resource_demote_0
> > >> * Resource action: king_resource demote on secondary
> > >> * Pseudo action: ms_king_resource_demoted_0
> > >> * Pseudo action: ms_king_resource_post_notify_demoted_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > post_notify_demoted_0
> > >> * Pseudo action: ms_king_resource_pre_notify_stop_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-pre_notify_stop_0
> > >> * Pseudo action: ms_king_resource_stop_0
> > >> * Resource action: king_resource stop on secondary
> > >> * Pseudo action: ms_king_resource_stopped_0
> > >> * Pseudo action: ms_king_resource_post_notify_stopped_0
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > post_notify_stopped_0
> > >> * Pseudo action: ms_king_resource_pre_notify_start_0
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > pre_notify_start_0
> > >> * Pseudo action: ms_king_resource_start_0
> > >> * Resource action: king_resource start on secondary
> > >> * Pseudo action: ms_king_resource_running_0
> > >> * Pseudo action: ms_king_resource_post_notify_running_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > post_notify_running_0
> > >> * Pseudo action: ms_king_resource_pre_notify_promote_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > pre_notify_promote_0
> > >> * Pseudo action: ms_king_resource_promote_0
> > >> * Resource action: king_resource promote on primary
> > >> * Pseudo action: ms_king_resource_promoted_0
> > >> * Pseudo action: ms_king_resource_post_notify_promoted_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > post_notify_promoted_0
> > >> * Resource action: king_resource monitor=11000 on secondary
> > >> * Resource action: king_resource monitor=10000 on primary
> > >> Clone Set: ms_king_resource [king_resource] (promotable)
> > >>
> > >>
> > >> SECOND KING RESOURCE MASTER FAILURE (INCORRECT BEHAVIOUR - SAME
> > NODE IS PROMOTED TO MASTER)
> > >>
> > >>
> > >> Clone Set: ms_king_resource [king_resource] (promotable)
> > >> king_resource (ocf::aviat:king-resource-ocf):
> > FAILED Master primary
> > >> clone_color: ms_king_resource allocation score on primary: 0
> > >> clone_color: ms_king_resource allocation score on secondary: 0
> > >> clone_color: king_resource:0 allocation score on primary: 0
> > >> clone_color: king_resource:0 allocation score on secondary: 200
> > >> clone_color: king_resource:1 allocation score on primary: 101
> > >> clone_color: king_resource:1 allocation score on secondary: 0
> > >> native_color: king_resource:0 allocation score on primary: 0
> > >> native_color: king_resource:0 allocation score on secondary: 200
> > >> native_color: king_resource:1 allocation score on primary: 101
> > >> native_color: king_resource:1 allocation score on secondary:
> > -INFINITY
> > >> king_resource:1 promotion score on primary: 1
> > >> king_resource:0 promotion score on secondary: 1
> >
> > At this point neither node has clear preference as master for
> > king_resource, so I would expect pacemaker to prefer the current
> > node
> > (it has to break tie somehow). Master scores are normally set by
> > resource agents so you really need to investigate what your agent
> > does
> > and how scores are set when this happens.
> >
> > >> * Recover king_resource:1 ( Master primary )
> > >> * Pseudo action: ms_king_resource_pre_notify_demote_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > pre_notify_demote_0
> > >> * Pseudo action: ms_king_resource_demote_0
> > >> * Resource action: king_resource demote on primary
> > >> * Pseudo action: ms_king_resource_demoted_0
> > >> * Pseudo action: ms_king_resource_post_notify_demoted_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > post_notify_demoted_0
> > >> * Pseudo action: ms_king_resource_pre_notify_stop_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-pre_notify_stop_0
> > >> * Pseudo action: ms_king_resource_stop_0
> > >> * Resource action: king_resource stop on primary
> > >> * Pseudo action: ms_king_resource_stopped_0
> > >> * Pseudo action: ms_king_resource_post_notify_stopped_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > post_notify_stopped_0
> > >> * Pseudo action: ms_king_resource_pre_notify_start_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > pre_notify_start_0
> > >> * Pseudo action: ms_king_resource_start_0
> > >> * Resource action: king_resource start on primary
> > >> * Pseudo action: ms_king_resource_running_0
> > >> * Pseudo action: ms_king_resource_post_notify_running_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > post_notify_running_0
> > >> * Pseudo action: ms_king_resource_pre_notify_promote_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > pre_notify_promote_0
> > >> * Pseudo action: ms_king_resource_promote_0
> > >> * Resource action: king_resource promote on primary
> > >> * Pseudo action: ms_king_resource_promoted_0
> > >> * Pseudo action: ms_king_resource_post_notify_promoted_0
> > >> * Resource action: king_resource notify on secondary
> > >> * Resource action: king_resource notify on primary
> > >> * Pseudo action: ms_king_resource_confirmed-
> > post_notify_promoted_0
> > >> * Resource action: king_resource monitor=10000 on primary
> > >> Clone Set: ms_king_resource [king_resource] (promotable)
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list