[ClusterLabs] Problems with master/slave failovers

Andrei Borzenkov arvidjaar at gmail.com
Mon Jul 1 23:42:57 EDT 2019


02.07.2019 2:30, Harvey Shepherd пишет:
>> The "transition summary" is just a resource-by-resource list, not the
>> order things will be done. The "executing cluster transition" section
>> is the order things are being done.
> 
> Thanks Ken. I think that's where the problem is originating. If you look at the "executing cluster transition" section, it's actually restarting the failed king instance BEFORE promoting the remaining in-service slave. When the failed resource comes back online, that adjusts the master scores, resulting in the transition being aborted. Both nodes then end up having the same master score for the king resource, and Pacemaker decides to re-promote the original master.

It does not explain why it happens only in one direction. Unless your
resource agent is doing something differently in each case, but that is
something only you can check.

> I would have expected Pacemaker's priority to be to ensure that there was a master available first, then to restart the failed instance in slave mode. Is there a way to configure it to do that?
> 
>>
>> Current cluster status:
>> Online: [ primary secondary ]
>>
>>  stk_shared_ip  (ocf::heartbeat:IPaddr2):       Started secondary
>>  Clone Set: ms_king_resource [king_resource] (promotable)
>>      king_resource      (ocf::aviat:king-resource-ocf):    FAILED
>> primary
>>      Slaves: [ secondary ]
>>  Clone Set: ms_servant1 [servant1]
>>      Started: [ primary secondary ]
>>  Clone Set: ms_servant2 [servant2] (promotable)
>>      Masters: [ primary ]
>>      Slaves: [ secondary ]
>>  Clone Set: ms_servant3 [servant3] (promotable)
>>      Masters: [ primary ]
>>      Slaves: [ secondary ]
>>  servant4        (lsb:servant4):  Started primary
>>  servant5  (lsb:servant5):    Started primary
>>  servant6      (lsb:servant6):        Started primary
>>  servant7      (lsb:servant7):      Started primary
>>  servant8      (lsb:servant8):        Started primary
>>  Resource Group: servant9_active_disabled
>>      servant9_resource1      (lsb:servant9_resource1):    Started
>> primary
>>      servant9_resource2   (lsb:servant9_resource2): Started primary
>>  servant10 (lsb:servant10):   Started primary
>>  servant11 (lsb:servant11):      Started primary
>>  servant12    (lsb:servant12):      Started primary
>>  servant13        (lsb:servant13):  Started primary
>>
>> Transition Summary:
>>  * Recover    king_resource:0     (             Slave primary )
>>  * Promote    king_resource:1     ( Slave -> Master secondary )
>>  * Demote     servant2:0          (   Master -> Slave primary )
>>  * Promote    servant2:1          ( Slave -> Master secondary )
>>  * Demote     servant3:0          (   Master -> Slave primary )
>>  * Promote    servant3:1          ( Slave -> Master secondary )
>>  * Move       servant4             (      primary -> secondary )
>>  * Move       servant5               (      primary -> secondary )
>>  * Move       servant6           (      primary -> secondary )
>>  * Move       servant7           (      primary -> secondary )
>>  * Move       servant8           (      primary -> secondary )
>>  * Move       servant9_resource1               (      primary ->
>> secondary )
>>  * Move       servant9_resource2    (      primary -> secondary )
>>  * Move       servant10              (      primary -> secondary )
>>  * Move       servant11              (      primary -> secondary )
>>  * Move       servant12                 (      primary -> secondary
>> )
>>  * Move       servant13             (      primary -> secondary )
>>
>> Executing cluster transition:
>>  * Pseudo action:   ms_king_resource_pre_notify_stop_0
>>  * Pseudo action:   ms_servant2_pre_notify_demote_0
>>  * Resource action: servant3        cancel=10000 on primary
>>  * Resource action: servant3        cancel=11000 on secondary
>>  * Pseudo action:   ms_servant3_pre_notify_demote_0
>>  * Resource action: servant4         stop on primary
>>  * Resource action: servant5           stop on primary
>>  * Resource action: servant6       stop on primary
>>  * Resource action: servant7       stop on primary
>>  * Resource action: servant8       stop on primary
>>  * Pseudo action:   servant9_active_disabled_stop_0
>>  * Resource action: servant9_resource2 stop on primary
>>  * Resource action: servant10          stop on primary
>>  * Resource action: servant11          stop on primary
>>  * Resource action: servant12             stop on primary
>>  * Resource action: servant13         stop on primary
>>  * Resource action: king_resource   notify on primary
>>  * Resource action: king_resource   notify on secondary
>>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_stop_0
>>  * Pseudo action:   ms_king_resource_stop_0
>>  * Resource action: servant2        notify on primary
>>  * Resource action: servant2        notify on secondary
>>  * Pseudo action:   ms_servant2_confirmed-pre_notify_demote_0
>>  * Pseudo action:   ms_servant2_demote_0
>>  * Resource action: servant3        notify on primary
>>  * Resource action: servant3        notify on secondary
>>  * Pseudo action:   ms_servant3_confirmed-pre_notify_demote_0
>>  * Pseudo action:   ms_servant3_demote_0
>>  * Resource action: servant4         start on secondary
>>  * Resource action: servant5           start on secondary
>>  * Resource action: servant6       start on secondary
>>  * Resource action: servant7       start on secondary
>>  * Resource action: servant8       start on secondary
>>  * Resource action: servant9_resource1           stop on primary
>>  * Resource action: servant10          start on secondary
>>  * Resource action: servant11          start on secondary
>>  * Resource action: servant12             start on secondary
>>  * Resource action: servant13         start on secondary
>>  * Resource action: king_resource   stop on primary
>>  * Pseudo action:   ms_king_resource_stopped_0
>>  * Resource action: servant2        demote on primary
>>  * Pseudo action:   ms_servant2_demoted_0
>>  * Resource action: servant3        demote on primary
>>  * Pseudo action:   ms_servant3_demoted_0
>>  * Resource action: servant4         monitor=10000 on secondary
>>  * Resource action: servant5           monitor=10000 on secondary
>>  * Resource action: servant6       monitor=10000 on secondary
>>  * Resource action: servant7       monitor=10000 on secondary
>>  * Resource action: servant8       monitor=10000 on secondary
>>  * Pseudo action:   servant9_active_disabled_stopped_0
>>  * Pseudo action:   servant9_active_disabled_start_0
>>  * Resource action: servant9_resource1           start on secondary
>>  * Resource action: servant9_resource2 start on secondary
>>  * Resource action: servant10          monitor=10000 on secondary
>>  * Resource action: servant11          monitor=10000 on secondary
>>  * Resource action: servant12             monitor=10000 on secondary
>>  * Resource action: servant13         monitor=10000 on secondary
>>  * Pseudo action:   ms_king_resource_post_notify_stopped_0
>>  * Pseudo action:   ms_servant2_post_notify_demoted_0
>>  * Pseudo action:   ms_servant3_post_notify_demoted_0
>>  * Pseudo action:   servant9_active_disabled_running_0
>>  * Resource action: servant9_resource1           monitor=10000 on
>> secondary
>>  * Resource action: servant9_resource2 monitor=10000 on secondary
>>  * Resource action: king_resource   notify on secondary
>>  * Pseudo action:   ms_king_resource_confirmed-post_notify_stopped_0
>>  * Pseudo action:   ms_king_resource_pre_notify_start_0
>>  * Resource action: servant2        notify on primary
>>  * Resource action: servant2        notify on secondary
>>  * Pseudo action:   ms_servant2_confirmed-post_notify_demoted_0
>>  * Pseudo action:   ms_servant2_pre_notify_promote_0
>>  * Resource action: servant3        notify on primary
>>  * Resource action: servant3        notify on secondary
>>  * Pseudo action:   ms_servant3_confirmed-post_notify_demoted_0
>>  * Pseudo action:   ms_servant3_pre_notify_promote_0
>>  * Resource action: king_resource   notify on secondary
>>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_start_0
>>  * Pseudo action:   ms_king_resource_start_0
>>  * Resource action: servant2        notify on primary
>>  * Resource action: servant2        notify on secondary
>>  * Pseudo action:   ms_servant2_confirmed-pre_notify_promote_0
>>  * Pseudo action:   ms_servant2_promote_0
>>  * Resource action: servant3        notify on primary
>>  * Resource action: servant3        notify on secondary
>>  * Pseudo action:   ms_servant3_confirmed-pre_notify_promote_0
>>  * Pseudo action:   ms_servant3_promote_0
>>  * Resource action: king_resource   start on primary
>>  * Pseudo action:   ms_king_resource_running_0
>>  * Resource action: servant2        promote on secondary
>>  * Pseudo action:   ms_servant2_promoted_0
>>  * Resource action: servant3        promote on secondary
>>  * Pseudo action:   ms_servant3_promoted_0
>>  * Pseudo action:   ms_king_resource_post_notify_running_0
>>  * Pseudo action:   ms_servant2_post_notify_promoted_0
>>  * Pseudo action:   ms_servant3_post_notify_promoted_0
>>  * Resource action: king_resource   notify on primary
>>  * Resource action: king_resource   notify on secondary
>>  * Pseudo action:   ms_king_resource_confirmed-post_notify_running_0
>>  * Resource action: servant2        notify on primary
>>  * Resource action: servant2        notify on secondary
>>  * Pseudo action:   ms_servant2_confirmed-post_notify_promoted_0
>>  * Resource action: servant3        notify on primary
>>  * Resource action: servant3        notify on secondary
>>  * Pseudo action:   ms_servant3_confirmed-post_notify_promoted_0
>>  * Pseudo action:   ms_king_resource_pre_notify_promote_0
>>  * Resource action: servant2        monitor=11000 on primary
>>  * Resource action: servant2        monitor=10000 on secondary
>>  * Resource action: servant3        monitor=11000 on primary
>>  * Resource action: servant3        monitor=10000 on secondary
>>  * Resource action: king_resource   notify on primary
>>  * Resource action: king_resource   notify on secondary
>>  * Pseudo action:   ms_king_resource_confirmed-pre_notify_promote_0
>>  * Pseudo action:   ms_king_resource_promote_0
>>  * Resource action: king_resource   promote on secondary
>>  * Pseudo action:   ms_king_resource_promoted_0
>>  * Pseudo action:   ms_king_resource_post_notify_promoted_0
>>  * Resource action: king_resource   notify on primary
>>  * Resource action: king_resource   notify on secondary
>>  * Pseudo action:   ms_king_resource_confirmed-post_notify_promoted_0
>>  * Resource action: king_resource   monitor=11000 on primary
>>  * Resource action: king_resource   monitor=10000 on secondary
>> Using the original execution date of: 2019-06-29 02:33:03Z
>>
>> Revised cluster status:
>> Online: [ primary secondary ]
>>
>>  stk_shared_ip  (ocf::heartbeat:IPaddr2):       Started secondary
>>  Clone Set: ms_king_resource [king_resource] (promotable)
>>      Masters: [ secondary ]
>>      Slaves: [ primary ]
>>  Clone Set: ms_servant1 [servant1]
>>      Started: [ primary secondary ]
>>  Clone Set: ms_servant2 [servant2] (promotable)
>>      Masters: [ secondary ]
>>      Slaves: [ primary ]
>>  Clone Set: ms_servant3 [servant3] (promotable)
>>      Masters: [ secondary ]
>>      Slaves: [ primary ]
>>  servant4        (lsb:servant4):  Started secondary
>>  servant5  (lsb:servant5):    Started secondary
>>  servant6      (lsb:servant6):        Started secondary
>>  servant7      (lsb:servant7):      Started secondary
>>  servant8      (lsb:servant8):        Started secondary
>>  Resource Group: servant9_active_disabled
>>      servant9_resource1      (lsb:servant9_resource1):    Started
>> secondary
>>      servant9_resource2   (lsb:servant9_resource2): Started secondary
>>  servant10 (lsb:servant10):   Started secondary
>>  servant11 (lsb:servant11):      Started secondary
>>  servant12    (lsb:servant12):      Started secondary
>>  servant13        (lsb:servant13):  Started secondary
>>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 



More information about the Users mailing list