[Pacemaker] Orphan problem when creating a clone of a group

Andrew Beekhof andrew at beekhof.net
Tue Nov 30 05:10:46 EST 2010


On Mon, Nov 29, 2010 at 5:37 PM, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> Hi,
>
> On Mon, Nov 29, 2010 at 02:42:42PM +0100, Uwe Grawert wrote:
>> Was: Re: [Pacemaker] crm resource restart doesn't restart the correct resource
>>
>> Zitat von Dejan Muhamedagic <dejanmm at fastmail.fm>:
>>
>>>> This is happening, because, when the clone is created,
>>>> pacemaker stops the primitive but does not wait for the stop action
>>>> to return, and just starts the primitive over. And that off course
>>>> causes problems.
>>>
>>> Hmm, don't quite understand what is going on. Is that primitive
>>> part of the group? Can you describe in more detail what is going
>>> on.
>>
>> I have a group (grp_fs) consisting of a LVM and several Filesystem
>> resources, in that order. That group is started and all resources are
>> running. Now I do clone this group by issuing:
>>
>> crm configure clone clo_fs grp_fs
>>
>> That does stop all resources and starts them again as clone. But
>> Pacemaker does not seem to wait until the stop action has finished. I
>> have modified the LVM RA to log the action command issued to the agent
>> and the value returned by the agent:
>>
>> 14:24:11 [ 14495 ] Action: start
>> 14:24:11 [ 14494 ] Action: stop
>> 14:24:13 [ 14494 ] RC: 1
>> 14:24:14 [ 14495 ] RC: 0
>> 14:24:14 [ 14599 ] Action: monitor
>> 14:24:14 [ 14599 ] RC: 0
>>
>> In brackets you see the PID. As can be seen, Pacemaker first issues a
>> start command and then immediately a stop afterwards, not waiting for
>> the first command to return. That produces an orphan resource. That
>> involves that the state of the LVM resource (which is now cloned) is
>> uncertain. It can happen to start but it can also fail.
>
> I see. The problem here is that as far as the cluster's
> concerned, the new resources and the old resources are
> unrelated: they have different names (before it was say lvm1 and
> now it's lvm1:0). I'm not sure if the crmd/pengine can tell if
> the resources of the group which are running actually belong to
> the cloned group as well. Andrew?

We'll find it if it's an anonymous clone (thanks to the initial monitor op).
Although things might be a bit confusing for a while since we'll
probably try and stop it under the "old" name (which would cause any
recurring monitor ops for the "new" name to fail)

> If not, then we'll have to
> forbid creating a clone of running resources in the shell.

Might be the best option.

> Thanks,
>
> Dejan
>
>> --
>> Uwe Grawert
>> Linux / Unix Consultant & Trainer
>> Tel.: +49 151 12051100
>> Mail: grawert at b1-systems.de
>>
>> B1 Systems GmbH
>> Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
>> GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
>>
>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>




More information about the Pacemaker mailing list