[ClusterLabs] Problems with master/slave failovers
Harvey Shepherd
Harvey.Shepherd at Aviatnet.com
Mon Jul 1 23:49:45 EDT 2019
I initially thought it was only in one direction, but it actually isn't. It's just that occasionally if the timing is just right then the failover manages to succeed. Besides, I don't think that has any bearing on why Pacemaker is trying to restart the failed resource instance before promoting the slave.
________________________________________
From: Users <users-bounces at clusterlabs.org> on behalf of Andrei Borzenkov <arvidjaar at gmail.com>
Sent: Tuesday, 2 July 2019 3:42 p.m.
To: users at clusterlabs.org
Subject: EXTERNAL: Re: [ClusterLabs] Problems with master/slave failovers
02.07.2019 2:30, Harvey Shepherd пишет:
>> The "transition summary" is just a resource-by-resource list, not the
>> order things will be done. The "executing cluster transition" section
>> is the order things are being done.
>
> Thanks Ken. I think that's where the problem is originating. If you look at the "executing cluster transition" section, it's actually restarting the failed king instance BEFORE promoting the remaining in-service slave. When the failed resource comes back online, that adjusts the master scores, resulting in the transition being aborted. Both nodes then end up having the same master score for the king resource, and Pacemaker decides to re-promote the original master.
It does not explain why it happens only in one direction. Unless your
resource agent is doing something differently in each case, but that is
something only you can check.
> I would have expected Pacemaker's priority to be to ensure that there was a master available first, then to restart the failed instance in slave mode. Is there a way to configure it to do that?
>
>>
>> Current cluster status:
>> Online: [ primary secondary ]
>>
>> stk_shared_ip (ocf::heartbeat:IPaddr2): Started secondary
>> Clone Set: ms_king_resource [king_resource] (promotable)
>> king_resource (ocf::aviat:king-resource-ocf): FAILED
>> primary
>> Slaves: [ secondary ]
>> Clone Set: ms_servant1 [servant1]
>> Started: [ primary secondary ]
>> Clone Set: ms_servant2 [servant2] (promotable)
>> Masters: [ primary ]
>> Slaves: [ secondary ]
>> Clone Set: ms_servant3 [servant3] (promotable)
>> Masters: [ primary ]
>> Slaves: [ secondary ]
>> servant4 (lsb:servant4): Started primary
>> servant5 (lsb:servant5): Started primary
>> servant6 (lsb:servant6): Started primary
>> servant7 (lsb:servant7): Started primary
>> servant8 (lsb:servant8): Started primary
>> Resource Group: servant9_active_disabled
>> servant9_resource1 (lsb:servant9_resource1): Started
>> primary
>> servant9_resource2 (lsb:servant9_resource2): Started primary
>> servant10 (lsb:servant10): Started primary
>> servant11 (lsb:servant11): Started primary
>> servant12 (lsb:servant12): Started primary
>> servant13 (lsb:servant13): Started primary
>>
>> Transition Summary:
>> * Recover king_resource:0 ( Slave primary )
>> * Promote king_resource:1 ( Slave -> Master secondary )
>> * Demote servant2:0 ( Master -> Slave primary )
>> * Promote servant2:1 ( Slave -> Master secondary )
>> * Demote servant3:0 ( Master -> Slave primary )
>> * Promote servant3:1 ( Slave -> Master secondary )
>> * Move servant4 ( primary -> secondary )
>> * Move servant5 ( primary -> secondary )
>> * Move servant6 ( primary -> secondary )
>> * Move servant7 ( primary -> secondary )
>> * Move servant8 ( primary -> secondary )
>> * Move servant9_resource1 ( primary ->
>> secondary )
>> * Move servant9_resource2 ( primary -> secondary )
>> * Move servant10 ( primary -> secondary )
>> * Move servant11 ( primary -> secondary )
>> * Move servant12 ( primary -> secondary
>> )
>> * Move servant13 ( primary -> secondary )
>>
>> Executing cluster transition:
>> * Pseudo action: ms_king_resource_pre_notify_stop_0
>> * Pseudo action: ms_servant2_pre_notify_demote_0
>> * Resource action: servant3 cancel=10000 on primary
>> * Resource action: servant3 cancel=11000 on secondary
>> * Pseudo action: ms_servant3_pre_notify_demote_0
>> * Resource action: servant4 stop on primary
>> * Resource action: servant5 stop on primary
>> * Resource action: servant6 stop on primary
>> * Resource action: servant7 stop on primary
>> * Resource action: servant8 stop on primary
>> * Pseudo action: servant9_active_disabled_stop_0
>> * Resource action: servant9_resource2 stop on primary
>> * Resource action: servant10 stop on primary
>> * Resource action: servant11 stop on primary
>> * Resource action: servant12 stop on primary
>> * Resource action: servant13 stop on primary
>> * Resource action: king_resource notify on primary
>> * Resource action: king_resource notify on secondary
>> * Pseudo action: ms_king_resource_confirmed-pre_notify_stop_0
>> * Pseudo action: ms_king_resource_stop_0
>> * Resource action: servant2 notify on primary
>> * Resource action: servant2 notify on secondary
>> * Pseudo action: ms_servant2_confirmed-pre_notify_demote_0
>> * Pseudo action: ms_servant2_demote_0
>> * Resource action: servant3 notify on primary
>> * Resource action: servant3 notify on secondary
>> * Pseudo action: ms_servant3_confirmed-pre_notify_demote_0
>> * Pseudo action: ms_servant3_demote_0
>> * Resource action: servant4 start on secondary
>> * Resource action: servant5 start on secondary
>> * Resource action: servant6 start on secondary
>> * Resource action: servant7 start on secondary
>> * Resource action: servant8 start on secondary
>> * Resource action: servant9_resource1 stop on primary
>> * Resource action: servant10 start on secondary
>> * Resource action: servant11 start on secondary
>> * Resource action: servant12 start on secondary
>> * Resource action: servant13 start on secondary
>> * Resource action: king_resource stop on primary
>> * Pseudo action: ms_king_resource_stopped_0
>> * Resource action: servant2 demote on primary
>> * Pseudo action: ms_servant2_demoted_0
>> * Resource action: servant3 demote on primary
>> * Pseudo action: ms_servant3_demoted_0
>> * Resource action: servant4 monitor=10000 on secondary
>> * Resource action: servant5 monitor=10000 on secondary
>> * Resource action: servant6 monitor=10000 on secondary
>> * Resource action: servant7 monitor=10000 on secondary
>> * Resource action: servant8 monitor=10000 on secondary
>> * Pseudo action: servant9_active_disabled_stopped_0
>> * Pseudo action: servant9_active_disabled_start_0
>> * Resource action: servant9_resource1 start on secondary
>> * Resource action: servant9_resource2 start on secondary
>> * Resource action: servant10 monitor=10000 on secondary
>> * Resource action: servant11 monitor=10000 on secondary
>> * Resource action: servant12 monitor=10000 on secondary
>> * Resource action: servant13 monitor=10000 on secondary
>> * Pseudo action: ms_king_resource_post_notify_stopped_0
>> * Pseudo action: ms_servant2_post_notify_demoted_0
>> * Pseudo action: ms_servant3_post_notify_demoted_0
>> * Pseudo action: servant9_active_disabled_running_0
>> * Resource action: servant9_resource1 monitor=10000 on
>> secondary
>> * Resource action: servant9_resource2 monitor=10000 on secondary
>> * Resource action: king_resource notify on secondary
>> * Pseudo action: ms_king_resource_confirmed-post_notify_stopped_0
>> * Pseudo action: ms_king_resource_pre_notify_start_0
>> * Resource action: servant2 notify on primary
>> * Resource action: servant2 notify on secondary
>> * Pseudo action: ms_servant2_confirmed-post_notify_demoted_0
>> * Pseudo action: ms_servant2_pre_notify_promote_0
>> * Resource action: servant3 notify on primary
>> * Resource action: servant3 notify on secondary
>> * Pseudo action: ms_servant3_confirmed-post_notify_demoted_0
>> * Pseudo action: ms_servant3_pre_notify_promote_0
>> * Resource action: king_resource notify on secondary
>> * Pseudo action: ms_king_resource_confirmed-pre_notify_start_0
>> * Pseudo action: ms_king_resource_start_0
>> * Resource action: servant2 notify on primary
>> * Resource action: servant2 notify on secondary
>> * Pseudo action: ms_servant2_confirmed-pre_notify_promote_0
>> * Pseudo action: ms_servant2_promote_0
>> * Resource action: servant3 notify on primary
>> * Resource action: servant3 notify on secondary
>> * Pseudo action: ms_servant3_confirmed-pre_notify_promote_0
>> * Pseudo action: ms_servant3_promote_0
>> * Resource action: king_resource start on primary
>> * Pseudo action: ms_king_resource_running_0
>> * Resource action: servant2 promote on secondary
>> * Pseudo action: ms_servant2_promoted_0
>> * Resource action: servant3 promote on secondary
>> * Pseudo action: ms_servant3_promoted_0
>> * Pseudo action: ms_king_resource_post_notify_running_0
>> * Pseudo action: ms_servant2_post_notify_promoted_0
>> * Pseudo action: ms_servant3_post_notify_promoted_0
>> * Resource action: king_resource notify on primary
>> * Resource action: king_resource notify on secondary
>> * Pseudo action: ms_king_resource_confirmed-post_notify_running_0
>> * Resource action: servant2 notify on primary
>> * Resource action: servant2 notify on secondary
>> * Pseudo action: ms_servant2_confirmed-post_notify_promoted_0
>> * Resource action: servant3 notify on primary
>> * Resource action: servant3 notify on secondary
>> * Pseudo action: ms_servant3_confirmed-post_notify_promoted_0
>> * Pseudo action: ms_king_resource_pre_notify_promote_0
>> * Resource action: servant2 monitor=11000 on primary
>> * Resource action: servant2 monitor=10000 on secondary
>> * Resource action: servant3 monitor=11000 on primary
>> * Resource action: servant3 monitor=10000 on secondary
>> * Resource action: king_resource notify on primary
>> * Resource action: king_resource notify on secondary
>> * Pseudo action: ms_king_resource_confirmed-pre_notify_promote_0
>> * Pseudo action: ms_king_resource_promote_0
>> * Resource action: king_resource promote on secondary
>> * Pseudo action: ms_king_resource_promoted_0
>> * Pseudo action: ms_king_resource_post_notify_promoted_0
>> * Resource action: king_resource notify on primary
>> * Resource action: king_resource notify on secondary
>> * Pseudo action: ms_king_resource_confirmed-post_notify_promoted_0
>> * Resource action: king_resource monitor=11000 on primary
>> * Resource action: king_resource monitor=10000 on secondary
>> Using the original execution date of: 2019-06-29 02:33:03Z
>>
>> Revised cluster status:
>> Online: [ primary secondary ]
>>
>> stk_shared_ip (ocf::heartbeat:IPaddr2): Started secondary
>> Clone Set: ms_king_resource [king_resource] (promotable)
>> Masters: [ secondary ]
>> Slaves: [ primary ]
>> Clone Set: ms_servant1 [servant1]
>> Started: [ primary secondary ]
>> Clone Set: ms_servant2 [servant2] (promotable)
>> Masters: [ secondary ]
>> Slaves: [ primary ]
>> Clone Set: ms_servant3 [servant3] (promotable)
>> Masters: [ secondary ]
>> Slaves: [ primary ]
>> servant4 (lsb:servant4): Started secondary
>> servant5 (lsb:servant5): Started secondary
>> servant6 (lsb:servant6): Started secondary
>> servant7 (lsb:servant7): Started secondary
>> servant8 (lsb:servant8): Started secondary
>> Resource Group: servant9_active_disabled
>> servant9_resource1 (lsb:servant9_resource1): Started
>> secondary
>> servant9_resource2 (lsb:servant9_resource2): Started secondary
>> servant10 (lsb:servant10): Started secondary
>> servant11 (lsb:servant11): Started secondary
>> servant12 (lsb:servant12): Started secondary
>> servant13 (lsb:servant13): Started secondary
>>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list