[ClusterLabs] Problems with master/slave failovers
Ken Gaillot
kgaillot at redhat.com
Tue Jul 2 17:59:24 EDT 2019
On Mon, 2019-07-01 at 23:30 +0000, Harvey Shepherd wrote:
> > The "transition summary" is just a resource-by-resource list, not
> > the
> > order things will be done. The "executing cluster transition"
> > section
> > is the order things are being done.
>
> Thanks Ken. I think that's where the problem is originating. If you
> look at the "executing cluster transition" section, it's actually
> restarting the failed king instance BEFORE promoting the remaining
> in-service slave. When the failed resource comes back online, that
> adjusts the master scores, resulting in the transition being aborted.
> Both nodes then end up having the same master score for the king
> resource, and Pacemaker decides to re-promote the original master. I
> would have expected Pacemaker's priority to be to ensure that there
> was a master available first, then to restart the failed instance in
> slave mode. Is there a way to configure it to do that?
No, that's intentional behavior. Starts are done before promotes so
that promotion scores are in their final state before ultimately
choosing the master. Otherwise, you'd end up in the same final
situation, but the master would fail over first then fail back.
It's up to the agent to set master scores in whatever fashion it
considers ideal.
You definitely want to separate the constraints for primitive and clone
dependencies of the king resource. The primitives are currently failing
over before the master has stopped because they're only ordered after
the king resource in any role, and since the slave is active, they can
start there.
> >
> > Current cluster status:
> > Online: [ primary secondary ]
> >
> > stk_shared_ip (ocf::heartbeat:IPaddr2): Started secondary
> > Clone Set: ms_king_resource [king_resource] (promotable)
> > king_resource (ocf::aviat:king-resource-ocf): FAILED
> > primary
> > Slaves: [ secondary ]
> > Clone Set: ms_servant1 [servant1]
> > Started: [ primary secondary ]
> > Clone Set: ms_servant2 [servant2] (promotable)
> > Masters: [ primary ]
> > Slaves: [ secondary ]
> > Clone Set: ms_servant3 [servant3] (promotable)
> > Masters: [ primary ]
> > Slaves: [ secondary ]
> > servant4 (lsb:servant4): Started primary
> > servant5 (lsb:servant5): Started primary
> > servant6 (lsb:servant6): Started primary
> > servant7 (lsb:servant7): Started primary
> > servant8 (lsb:servant8): Started primary
> > Resource Group: servant9_active_disabled
> > servant9_resource1 (lsb:servant9_resource1): Started
> > primary
> > servant9_resource2 (lsb:servant9_resource2): Started primary
> > servant10 (lsb:servant10): Started primary
> > servant11 (lsb:servant11): Started primary
> > servant12 (lsb:servant12): Started primary
> > servant13 (lsb:servant13): Started primary
> >
> > Transition Summary:
> > * Recover king_resource:0 ( Slave primary )
> > * Promote king_resource:1 ( Slave -> Master secondary )
> > * Demote servant2:0 ( Master -> Slave primary )
> > * Promote servant2:1 ( Slave -> Master secondary )
> > * Demote servant3:0 ( Master -> Slave primary )
> > * Promote servant3:1 ( Slave -> Master secondary )
> > * Move servant4 ( primary -> secondary )
> > * Move servant5 ( primary -> secondary )
> > * Move servant6 ( primary -> secondary )
> > * Move servant7 ( primary -> secondary )
> > * Move servant8 ( primary -> secondary )
> > * Move servant9_resource1 ( primary ->
> > secondary )
> > * Move servant9_resource2 ( primary -> secondary )
> > * Move servant10 ( primary -> secondary )
> > * Move servant11 ( primary -> secondary )
> > * Move servant12 ( primary -> secondary
> > )
> > * Move servant13 ( primary -> secondary )
> >
> > Executing cluster transition:
> > * Pseudo action: ms_king_resource_pre_notify_stop_0
> > * Pseudo action: ms_servant2_pre_notify_demote_0
> > * Resource action: servant3 cancel=10000 on primary
> > * Resource action: servant3 cancel=11000 on secondary
> > * Pseudo action: ms_servant3_pre_notify_demote_0
> > * Resource action: servant4 stop on primary
> > * Resource action: servant5 stop on primary
> > * Resource action: servant6 stop on primary
> > * Resource action: servant7 stop on primary
> > * Resource action: servant8 stop on primary
> > * Pseudo action: servant9_active_disabled_stop_0
> > * Resource action: servant9_resource2 stop on primary
> > * Resource action: servant10 stop on primary
> > * Resource action: servant11 stop on primary
> > * Resource action: servant12 stop on primary
> > * Resource action: servant13 stop on primary
> > * Resource action: king_resource notify on primary
> > * Resource action: king_resource notify on secondary
> > * Pseudo action: ms_king_resource_confirmed-pre_notify_stop_0
> > * Pseudo action: ms_king_resource_stop_0
> > * Resource action: servant2 notify on primary
> > * Resource action: servant2 notify on secondary
> > * Pseudo action: ms_servant2_confirmed-pre_notify_demote_0
> > * Pseudo action: ms_servant2_demote_0
> > * Resource action: servant3 notify on primary
> > * Resource action: servant3 notify on secondary
> > * Pseudo action: ms_servant3_confirmed-pre_notify_demote_0
> > * Pseudo action: ms_servant3_demote_0
> > * Resource action: servant4 start on secondary
> > * Resource action: servant5 start on secondary
> > * Resource action: servant6 start on secondary
> > * Resource action: servant7 start on secondary
> > * Resource action: servant8 start on secondary
> > * Resource action: servant9_resource1 stop on primary
> > * Resource action: servant10 start on secondary
> > * Resource action: servant11 start on secondary
> > * Resource action: servant12 start on secondary
> > * Resource action: servant13 start on secondary
> > * Resource action: king_resource stop on primary
> > * Pseudo action: ms_king_resource_stopped_0
> > * Resource action: servant2 demote on primary
> > * Pseudo action: ms_servant2_demoted_0
> > * Resource action: servant3 demote on primary
> > * Pseudo action: ms_servant3_demoted_0
> > * Resource action: servant4 monitor=10000 on secondary
> > * Resource action: servant5 monitor=10000 on secondary
> > * Resource action: servant6 monitor=10000 on secondary
> > * Resource action: servant7 monitor=10000 on secondary
> > * Resource action: servant8 monitor=10000 on secondary
> > * Pseudo action: servant9_active_disabled_stopped_0
> > * Pseudo action: servant9_active_disabled_start_0
> > * Resource action: servant9_resource1 start on secondary
> > * Resource action: servant9_resource2 start on secondary
> > * Resource action: servant10 monitor=10000 on secondary
> > * Resource action: servant11 monitor=10000 on secondary
> > * Resource action: servant12 monitor=10000 on
> > secondary
> > * Resource action: servant13 monitor=10000 on secondary
> > * Pseudo action: ms_king_resource_post_notify_stopped_0
> > * Pseudo action: ms_servant2_post_notify_demoted_0
> > * Pseudo action: ms_servant3_post_notify_demoted_0
> > * Pseudo action: servant9_active_disabled_running_0
> > * Resource action: servant9_resource1 monitor=10000 on
> > secondary
> > * Resource action: servant9_resource2 monitor=10000 on secondary
> > * Resource action: king_resource notify on secondary
> > * Pseudo action: ms_king_resource_confirmed-
> > post_notify_stopped_0
> > * Pseudo action: ms_king_resource_pre_notify_start_0
> > * Resource action: servant2 notify on primary
> > * Resource action: servant2 notify on secondary
> > * Pseudo action: ms_servant2_confirmed-post_notify_demoted_0
> > * Pseudo action: ms_servant2_pre_notify_promote_0
> > * Resource action: servant3 notify on primary
> > * Resource action: servant3 notify on secondary
> > * Pseudo action: ms_servant3_confirmed-post_notify_demoted_0
> > * Pseudo action: ms_servant3_pre_notify_promote_0
> > * Resource action: king_resource notify on secondary
> > * Pseudo action: ms_king_resource_confirmed-pre_notify_start_0
> > * Pseudo action: ms_king_resource_start_0
> > * Resource action: servant2 notify on primary
> > * Resource action: servant2 notify on secondary
> > * Pseudo action: ms_servant2_confirmed-pre_notify_promote_0
> > * Pseudo action: ms_servant2_promote_0
> > * Resource action: servant3 notify on primary
> > * Resource action: servant3 notify on secondary
> > * Pseudo action: ms_servant3_confirmed-pre_notify_promote_0
> > * Pseudo action: ms_servant3_promote_0
> > * Resource action: king_resource start on primary
> > * Pseudo action: ms_king_resource_running_0
> > * Resource action: servant2 promote on secondary
> > * Pseudo action: ms_servant2_promoted_0
> > * Resource action: servant3 promote on secondary
> > * Pseudo action: ms_servant3_promoted_0
> > * Pseudo action: ms_king_resource_post_notify_running_0
> > * Pseudo action: ms_servant2_post_notify_promoted_0
> > * Pseudo action: ms_servant3_post_notify_promoted_0
> > * Resource action: king_resource notify on primary
> > * Resource action: king_resource notify on secondary
> > * Pseudo action: ms_king_resource_confirmed-
> > post_notify_running_0
> > * Resource action: servant2 notify on primary
> > * Resource action: servant2 notify on secondary
> > * Pseudo action: ms_servant2_confirmed-post_notify_promoted_0
> > * Resource action: servant3 notify on primary
> > * Resource action: servant3 notify on secondary
> > * Pseudo action: ms_servant3_confirmed-post_notify_promoted_0
> > * Pseudo action: ms_king_resource_pre_notify_promote_0
> > * Resource action: servant2 monitor=11000 on primary
> > * Resource action: servant2 monitor=10000 on secondary
> > * Resource action: servant3 monitor=11000 on primary
> > * Resource action: servant3 monitor=10000 on secondary
> > * Resource action: king_resource notify on primary
> > * Resource action: king_resource notify on secondary
> > * Pseudo action: ms_king_resource_confirmed-pre_notify_promote_0
> > * Pseudo action: ms_king_resource_promote_0
> > * Resource action: king_resource promote on secondary
> > * Pseudo action: ms_king_resource_promoted_0
> > * Pseudo action: ms_king_resource_post_notify_promoted_0
> > * Resource action: king_resource notify on primary
> > * Resource action: king_resource notify on secondary
> > * Pseudo action: ms_king_resource_confirmed-
> > post_notify_promoted_0
> > * Resource action: king_resource monitor=11000 on primary
> > * Resource action: king_resource monitor=10000 on secondary
> > Using the original execution date of: 2019-06-29 02:33:03Z
> >
> > Revised cluster status:
> > Online: [ primary secondary ]
> >
> > stk_shared_ip (ocf::heartbeat:IPaddr2): Started secondary
> > Clone Set: ms_king_resource [king_resource] (promotable)
> > Masters: [ secondary ]
> > Slaves: [ primary ]
> > Clone Set: ms_servant1 [servant1]
> > Started: [ primary secondary ]
> > Clone Set: ms_servant2 [servant2] (promotable)
> > Masters: [ secondary ]
> > Slaves: [ primary ]
> > Clone Set: ms_servant3 [servant3] (promotable)
> > Masters: [ secondary ]
> > Slaves: [ primary ]
> > servant4 (lsb:servant4): Started secondary
> > servant5 (lsb:servant5): Started secondary
> > servant6 (lsb:servant6): Started secondary
> > servant7 (lsb:servant7): Started secondary
> > servant8 (lsb:servant8): Started secondary
> > Resource Group: servant9_active_disabled
> > servant9_resource1 (lsb:servant9_resource1): Started
> > secondary
> > servant9_resource2 (lsb:servant9_resource2): Started
> > secondary
> > servant10 (lsb:servant10): Started secondary
> > servant11 (lsb:servant11): Started secondary
> > servant12 (lsb:servant12): Started secondary
> > servant13 (lsb:servant13): Started secondary
> >
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list