[Pacemaker] Clone resource dependency issue - undesired restart of dependent resources

Ron Kerry rkerry at sgi.com
Tue Mar 1 22:48:49 UTC 2011

On 3/1/2011 2:39 PM, Ron Kerry wrote:
> On 2/28/2011 2:33 PM, Ron Kerry wrote:
>> Folks -
>> I have a configuration issue that I am unsure how to resolve. Consider the following set of
>> resources.
>> clone rsc1-clone rsc1 \
>> meta clone-max="2" target-role="Started"
>> primitive rsc1 ...
>> primitive rsc2 ... meta resource-stickiness="1"
>> primitive rsc3 ... meta resource-stickiness="1"
>> Plus the following constraints
>> colocation rsc2-with-clone inf: rsc2 rsc1-clone
>> colocation rsc3-with-clone inf: rsc3 rsc1-clone
>> order clone-before-rsc2 : rsc1-clone rsc2
>> order clone-before-rsc3 : rsc1-clone rsc3
>> I am getting the following behavior that is undesirable.
>> During normal operation, a copy of the rsc1 resource is running on my two systems with rs2 and rsc3
>> typically running split between the two systems. The rsc2 & rsc3 resources are operationally
>> dependent on a copy of rsc1 being up and running first.
>> SystemA SystemB
>> ======= =======
>> rsc1     rsc1
>> rsc2     rsc3
>> If SystemB goes down, then rsc3 moves over to SystemA as expected
>> SystemA SystemB
>> ======= =======
>> rsc1     X X
>> rsc2      X
>> rsc3     X X
>> When SystemB comes back into the cluster, crmd starts the rsc1 clone on SystemB but then also
>> restarts both rsc2 & rsc3. This means both are stopped and then both started again. This is not what
>> we want. We want these resources to remain running on SystemA until one of them is moved manually by
>> an administrator to re-balance them across the systems.
>> How do we configure these resources/constraints to achieve that behavior? We are already using
>> resource-stickiness, but that is meaningless if crmd is going to be doing a restart of these
>> resources.
> Using advisory (score="0") order constraints seems to acheive the correct behavior. I have not done
> extensive testing yet to see if other failover behaviors are broken with this approach, but initial
> basic testing looks good. It is always nice to answer one's own questions :-)
> colocation rsc2-with-clone inf: rsc2 rsc1-clone
> colocation rsc3-with-clone inf: rsc3 rsc1-clone
> order clone-before-rsc2 0: rsc1-clone rsc2
> order clone-before-rsc3 0: rsc1-clone rsc3
> Does anyone know of any specific problems with this approach??

I set up a greatly simplified generic resource configuration:

  Online: [ elvis queen ]
   Clone Set: A-clone [A]
       Started: [ elvis queen ]
   B-1    (ocf::rgk:typeB):       Started elvis
   B-2    (ocf::rgk:typeB):       Started queen
   Clone Set: stonith-l2network-set [stonith-l2network]
       Started: [ elvis queen ]

The A and B resources are just shell scripts in infinite while loop where the contents of the loop 
is a sleep 5 command so they run forever but do not consume machine resources.

If I kill the A-clone running on queen, it just gets restarted and nothing at all happens to B-2 (it 
stays on queen and never knows any different). This is not optimal behavior for our purposes.

However on the good side, if the A-clone cannot (re)start on queen, then B-2 does fail over to elvis 
as we expect.

Does anybody have any ideas about how to get the proper behavior in all cases?


Ron Kerry         rkerry at sgi.com
Global Product Support - SGI Federal

More information about the Pacemaker mailing list