[ClusterLabs] Problems with master/slave failovers

Harvey Shepherd Harvey.Shepherd at Aviatnet.com
Sat Jun 29 01:05:26 EDT 2019

There is an ordering constraint - everything must be started after the king resource. But even if this constraint didn't exist I don't see that it should logically make any difference due to all the non-clone resources being colocated with the master of the king resource. Surely it would make no sense for Pacemaker to start or move colocated resources until a master king resource has been elected?

    <tag id="servant2_dependents">
      <obj_ref id="servant4"/>
      <obj_ref id="servant5"/>
      <obj_ref id="servant6"/>
      <obj_ref id="servant7"/>
      <obj_ref id="servant8"/>
      <obj_ref id="servant9_active_disabled"/>
      <obj_ref id="servant11"/>
      <obj_ref id="servant12"/>
      <obj_ref id="servant13"/>
    <rsc_colocation id="colocation_with_king_resource_master" score="INFINITY">
      <resource_set id="king_resource_master_dependents" sequential="false">
        <resource_ref id="stk_shared_ip"/>
        <resource_ref id="servant4"/>
        <resource_ref id="servant5"/>
        <resource_ref id="servant6"/>
        <resource_ref id="servant7"/>
        <resource_ref id="servant8"/>
        <resource_ref id="servant9_active_disabled"/>
        <resource_ref id="servant10"/>
        <resource_ref id="servant11"/>
        <resource_ref id="servant12"/>
        <resource_ref id="servant13"/>
      <resource_set id="king_resource_master" sequential="true" role="Master">
        <resource_ref id="ms_king_resource"/>
        <resource_ref id="ms_servant2"/>
        <resource_ref id="ms_servant3"/>
    <rsc_order id="dependents_after_servant2" kind="Mandatory" first="ms_servant2" then="servant2_dependents"/>

From: Users <users-bounces at clusterlabs.org> on behalf of Andrei Borzenkov <arvidjaar at gmail.com>
Sent: Saturday, 29 June 2019 4:13 p.m.
To: users at clusterlabs.org
Subject: EXTERNAL: Re: [ClusterLabs] Problems with master/slave failovers

29.06.2019 6:01, Harvey Shepherd пишет:
> As you can see, it eventually gives up in the transition attempt and starts a new one. Eventually the failed king resource master has had time to come back online and it then just promotes it again and forgets about trying to failover. I'm not sure if the cluster transition actions listed by crm_simulate are in the order in which Pacemaker tries to carry out the operations, but if so the order is wrong. It should be stopping all servant resources on the failed king master, then failing over the king resource, then migrating the servant resources to the new master node. Instead it seems to be trying to migrate all the servant resources over first, with the king master failover scheduled near the bottom, which won't work due to the colocation constraint with the king master.

Unless you configured explicit ordering between resources, pacemaker is
free to chose any order.
Manage your subscription:

ClusterLabs home: https://www.clusterlabs.org/

More information about the Users mailing list