[Pacemaker] Orphan problem when creating a clone of a group
    Dejan Muhamedagic 
    dejanmm at fastmail.fm
       
    Mon Nov 29 16:37:44 UTC 2010
    
    
  
Hi,
On Mon, Nov 29, 2010 at 02:42:42PM +0100, Uwe Grawert wrote:
> Was: Re: [Pacemaker] crm resource restart doesn't restart the correct resource
>
> Zitat von Dejan Muhamedagic <dejanmm at fastmail.fm>:
>
>>> This is happening, because, when the clone is created,
>>> pacemaker stops the primitive but does not wait for the stop action
>>> to return, and just starts the primitive over. And that off course
>>> causes problems.
>>
>> Hmm, don't quite understand what is going on. Is that primitive
>> part of the group? Can you describe in more detail what is going
>> on.
>
> I have a group (grp_fs) consisting of a LVM and several Filesystem  
> resources, in that order. That group is started and all resources are  
> running. Now I do clone this group by issuing:
>
> crm configure clone clo_fs grp_fs
>
> That does stop all resources and starts them again as clone. But  
> Pacemaker does not seem to wait until the stop action has finished. I  
> have modified the LVM RA to log the action command issued to the agent  
> and the value returned by the agent:
>
> 14:24:11 [ 14495 ] Action: start
> 14:24:11 [ 14494 ] Action: stop
> 14:24:13 [ 14494 ] RC: 1
> 14:24:14 [ 14495 ] RC: 0
> 14:24:14 [ 14599 ] Action: monitor
> 14:24:14 [ 14599 ] RC: 0
>
> In brackets you see the PID. As can be seen, Pacemaker first issues a  
> start command and then immediately a stop afterwards, not waiting for  
> the first command to return. That produces an orphan resource. That  
> involves that the state of the LVM resource (which is now cloned) is  
> uncertain. It can happen to start but it can also fail.
I see. The problem here is that as far as the cluster's
concerned, the new resources and the old resources are
unrelated: they have different names (before it was say lvm1 and
now it's lvm1:0). I'm not sure if the crmd/pengine can tell if
the resources of the group which are running actually belong to
the cloned group as well. Andrew? If not, then we'll have to
forbid creating a clone of running resources in the shell.
Thanks,
Dejan
> -- 
> Uwe Grawert
> Linux / Unix Consultant & Trainer
> Tel.: +49 151 12051100
> Mail: grawert at b1-systems.de
>
> B1 Systems GmbH
> Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
> GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
    
    
More information about the Pacemaker
mailing list