[ClusterLabs] Problems with master/slave failovers

Ken Gaillot kgaillot at redhat.com
Tue Jul 2 17:59:24 EDT 2019


On Mon, 2019-07-01 at 23:30 +0000, Harvey Shepherd wrote:
> > The "transition summary" is just a resource-by-resource list, not
> > the
> > order things will be done. The "executing cluster transition"
> > section
> > is the order things are being done.
> 
> Thanks Ken. I think that's where the problem is originating. If you
> look at the "executing cluster transition" section, it's actually
> restarting the failed king instance BEFORE promoting the remaining
> in-service slave. When the failed resource comes back online, that
> adjusts the master scores, resulting in the transition being aborted.
> Both nodes then end up having the same master score for the king
> resource, and Pacemaker decides to re-promote the original master. I
> would have expected Pacemaker's priority to be to ensure that there
> was a master available first, then to restart the failed instance in
> slave mode. Is there a way to configure it to do that?

No, that's intentional behavior. Starts are done before promotes so
that promotion scores are in their final state before ultimately
choosing the master. Otherwise, you'd end up in the same final
situation, but the master would fail over first then fail back.

It's up to the agent to set master scores in whatever fashion it
considers ideal.

You definitely want to separate the constraints for primitive and clone
dependencies of the king resource. The primitives are currently failing
over before the master has stopped because they're only ordered after
the king resource in any role, and since the slave is active, they can
start there.

> > 
> > Current cluster status:
> > Online: [ primary secondary ]
> > 
> >  stk_shared_ip  (ocf::heartbeat:IPaddr2):       Started secondary
> >  Clone Set: ms_king_resource [king_resource] (promotable)
> >      king_resource      (ocf::aviat:king-resource-ocf):    FAILED
> > primary
> >      Slaves: [ secondary ]
> >  Clone Set: ms_servant1 [servant1]
> >      Started: [ primary secondary ]
> >  Clone Set: ms_servant2 [servant2] (promotable)
> >      Masters: [ primary ]
> >      Slaves: [ secondary ]
> >  Clone Set: ms_servant3 [servant3] (promotable)
> >      Masters: [ primary ]
> >      Slaves: [ secondary ]
> >  servant4        (lsb:servant4):  Started primary
> >  servant5  (lsb:servant5):    Started primary
> >  servant6      (lsb:servant6):        Started primary
> >  servant7      (lsb:servant7):      Started primary
> >  servant8      (lsb:servant8):        Started primary
> >  Resource Group: servant9_active_disabled
> >      servant9_resource1      (lsb:servant9_resource1):    Started
> > primary
> >      servant9_resource2   (lsb:servant9_resource2): Started primary
> >  servant10 (lsb:servant10):   Started primary
> >  servant11 (lsb:servant11):      Started primary
> >  servant12    (lsb:servant12):      Started primary
> >  servant13        (lsb:servant13):  Started primary
> > 
> > Transition Summary:
> >  * Recover    king_resource:0     (             Slave primary )
> >  * Promote    king_resource:1     ( Slave -> Master secondary )
> >  * Demote     servant2:0          (   Master -> Slave primary )
> >  * Promote    servant2:1          ( Slave -> Master secondary )
> >  * Demote     servant3:0          (   Master -> Slave primary )
> >  * Promote    servant3:1          ( Slave -> Master secondary )
> >  * Move       servant4             (      primary -> secondary )
> >  * Move       servant5               (      primary -> secondary )
> >  * Move       servant6           (      primary -> secondary )
> >  * Move       servant7           (      primary -> secondary )
> >  * Move       servant8           (      primary -> secondary )
> >  * Move       servant9_resource1               (      primary ->
> > secondary )
> >  * Move       servant9_resource2    (      primary -> secondary )
> >  * Move       servant10              (      primary -> secondary )
> >  * Move       servant11              (      primary -> secondary )
> >  * Move       servant12                 (      primary -> secondary
> > )
> >  * Move       servant13             (      primary -> secondary )
> > 
> > Executing cluster transition:
> >  * Pseudo action:   ms_king_resource_pre_notify_stop_0
> >  * Pseudo action:   ms_servant2_pre_notify_demote_0
> >  * Resource action: servant3        cancel=10000 on primary
> >  * Resource action: servant3        cancel=11000 on secondary
> >  * Pseudo action:   ms_servant3_pre_notify_demote_0
> >  * Resource action: servant4         stop on primary
> >  * Resource action: servant5           stop on primary
> >  * Resource action: servant6       stop on primary
> >  * Resource action: servant7       stop on primary
> >  * Resource action: servant8       stop on primary
> >  * Pseudo action:   servant9_active_disabled_stop_0
> >  * Resource action: servant9_resource2 stop on primary
> >  * Resource action: servant10          stop on primary
> >  * Resource action: servant11          stop on primary
> >  * Resource action: servant12             stop on primary
> >  * Resource action: servant13         stop on primary
> >  * Resource action: king_resource   notify on primary
> >  * Resource action: king_resource   notify on secondary
> >  * Pseudo action:   ms_king_resource_confirmed-pre_notify_stop_0
> >  * Pseudo action:   ms_king_resource_stop_0
> >  * Resource action: servant2        notify on primary
> >  * Resource action: servant2        notify on secondary
> >  * Pseudo action:   ms_servant2_confirmed-pre_notify_demote_0
> >  * Pseudo action:   ms_servant2_demote_0
> >  * Resource action: servant3        notify on primary
> >  * Resource action: servant3        notify on secondary
> >  * Pseudo action:   ms_servant3_confirmed-pre_notify_demote_0
> >  * Pseudo action:   ms_servant3_demote_0
> >  * Resource action: servant4         start on secondary
> >  * Resource action: servant5           start on secondary
> >  * Resource action: servant6       start on secondary
> >  * Resource action: servant7       start on secondary
> >  * Resource action: servant8       start on secondary
> >  * Resource action: servant9_resource1           stop on primary
> >  * Resource action: servant10          start on secondary
> >  * Resource action: servant11          start on secondary
> >  * Resource action: servant12             start on secondary
> >  * Resource action: servant13         start on secondary
> >  * Resource action: king_resource   stop on primary
> >  * Pseudo action:   ms_king_resource_stopped_0
> >  * Resource action: servant2        demote on primary
> >  * Pseudo action:   ms_servant2_demoted_0
> >  * Resource action: servant3        demote on primary
> >  * Pseudo action:   ms_servant3_demoted_0
> >  * Resource action: servant4         monitor=10000 on secondary
> >  * Resource action: servant5           monitor=10000 on secondary
> >  * Resource action: servant6       monitor=10000 on secondary
> >  * Resource action: servant7       monitor=10000 on secondary
> >  * Resource action: servant8       monitor=10000 on secondary
> >  * Pseudo action:   servant9_active_disabled_stopped_0
> >  * Pseudo action:   servant9_active_disabled_start_0
> >  * Resource action: servant9_resource1           start on secondary
> >  * Resource action: servant9_resource2 start on secondary
> >  * Resource action: servant10          monitor=10000 on secondary
> >  * Resource action: servant11          monitor=10000 on secondary
> >  * Resource action: servant12             monitor=10000 on
> > secondary
> >  * Resource action: servant13         monitor=10000 on secondary
> >  * Pseudo action:   ms_king_resource_post_notify_stopped_0
> >  * Pseudo action:   ms_servant2_post_notify_demoted_0
> >  * Pseudo action:   ms_servant3_post_notify_demoted_0
> >  * Pseudo action:   servant9_active_disabled_running_0
> >  * Resource action: servant9_resource1           monitor=10000 on
> > secondary
> >  * Resource action: servant9_resource2 monitor=10000 on secondary
> >  * Resource action: king_resource   notify on secondary
> >  * Pseudo action:   ms_king_resource_confirmed-
> > post_notify_stopped_0
> >  * Pseudo action:   ms_king_resource_pre_notify_start_0
> >  * Resource action: servant2        notify on primary
> >  * Resource action: servant2        notify on secondary
> >  * Pseudo action:   ms_servant2_confirmed-post_notify_demoted_0
> >  * Pseudo action:   ms_servant2_pre_notify_promote_0
> >  * Resource action: servant3        notify on primary
> >  * Resource action: servant3        notify on secondary
> >  * Pseudo action:   ms_servant3_confirmed-post_notify_demoted_0
> >  * Pseudo action:   ms_servant3_pre_notify_promote_0
> >  * Resource action: king_resource   notify on secondary
> >  * Pseudo action:   ms_king_resource_confirmed-pre_notify_start_0
> >  * Pseudo action:   ms_king_resource_start_0
> >  * Resource action: servant2        notify on primary
> >  * Resource action: servant2        notify on secondary
> >  * Pseudo action:   ms_servant2_confirmed-pre_notify_promote_0
> >  * Pseudo action:   ms_servant2_promote_0
> >  * Resource action: servant3        notify on primary
> >  * Resource action: servant3        notify on secondary
> >  * Pseudo action:   ms_servant3_confirmed-pre_notify_promote_0
> >  * Pseudo action:   ms_servant3_promote_0
> >  * Resource action: king_resource   start on primary
> >  * Pseudo action:   ms_king_resource_running_0
> >  * Resource action: servant2        promote on secondary
> >  * Pseudo action:   ms_servant2_promoted_0
> >  * Resource action: servant3        promote on secondary
> >  * Pseudo action:   ms_servant3_promoted_0
> >  * Pseudo action:   ms_king_resource_post_notify_running_0
> >  * Pseudo action:   ms_servant2_post_notify_promoted_0
> >  * Pseudo action:   ms_servant3_post_notify_promoted_0
> >  * Resource action: king_resource   notify on primary
> >  * Resource action: king_resource   notify on secondary
> >  * Pseudo action:   ms_king_resource_confirmed-
> > post_notify_running_0
> >  * Resource action: servant2        notify on primary
> >  * Resource action: servant2        notify on secondary
> >  * Pseudo action:   ms_servant2_confirmed-post_notify_promoted_0
> >  * Resource action: servant3        notify on primary
> >  * Resource action: servant3        notify on secondary
> >  * Pseudo action:   ms_servant3_confirmed-post_notify_promoted_0
> >  * Pseudo action:   ms_king_resource_pre_notify_promote_0
> >  * Resource action: servant2        monitor=11000 on primary
> >  * Resource action: servant2        monitor=10000 on secondary
> >  * Resource action: servant3        monitor=11000 on primary
> >  * Resource action: servant3        monitor=10000 on secondary
> >  * Resource action: king_resource   notify on primary
> >  * Resource action: king_resource   notify on secondary
> >  * Pseudo action:   ms_king_resource_confirmed-pre_notify_promote_0
> >  * Pseudo action:   ms_king_resource_promote_0
> >  * Resource action: king_resource   promote on secondary
> >  * Pseudo action:   ms_king_resource_promoted_0
> >  * Pseudo action:   ms_king_resource_post_notify_promoted_0
> >  * Resource action: king_resource   notify on primary
> >  * Resource action: king_resource   notify on secondary
> >  * Pseudo action:   ms_king_resource_confirmed-
> > post_notify_promoted_0
> >  * Resource action: king_resource   monitor=11000 on primary
> >  * Resource action: king_resource   monitor=10000 on secondary
> > Using the original execution date of: 2019-06-29 02:33:03Z
> > 
> > Revised cluster status:
> > Online: [ primary secondary ]
> > 
> >  stk_shared_ip  (ocf::heartbeat:IPaddr2):       Started secondary
> >  Clone Set: ms_king_resource [king_resource] (promotable)
> >      Masters: [ secondary ]
> >      Slaves: [ primary ]
> >  Clone Set: ms_servant1 [servant1]
> >      Started: [ primary secondary ]
> >  Clone Set: ms_servant2 [servant2] (promotable)
> >      Masters: [ secondary ]
> >      Slaves: [ primary ]
> >  Clone Set: ms_servant3 [servant3] (promotable)
> >      Masters: [ secondary ]
> >      Slaves: [ primary ]
> >  servant4        (lsb:servant4):  Started secondary
> >  servant5  (lsb:servant5):    Started secondary
> >  servant6      (lsb:servant6):        Started secondary
> >  servant7      (lsb:servant7):      Started secondary
> >  servant8      (lsb:servant8):        Started secondary
> >  Resource Group: servant9_active_disabled
> >      servant9_resource1      (lsb:servant9_resource1):    Started
> > secondary
> >      servant9_resource2   (lsb:servant9_resource2): Started
> > secondary
> >  servant10 (lsb:servant10):   Started secondary
> >  servant11 (lsb:servant11):      Started secondary
> >  servant12    (lsb:servant12):      Started secondary
> >  servant13        (lsb:servant13):  Started secondary
> > 
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list