[Pacemaker] Should monitor operations be stopped after a resource is unmanaged?

Tim Serong tserong at novell.com
Mon Apr 4 02:27:13 EDT 2011


On 4/4/2011 at 04:29 AM, Ron Kerry <rkerry at sgi.com> wrote: 
> On 7/22/64 2:59 PM, Tim Serong wrote: 
> > On 4/2/2011 at 09:42 PM, Ron Kerry <rkerry at sgi.com> wrote: 
> >  > On 7/22/64 2:59 PM, Serge Dubrouski wrote: 
> >  > > On Fri, Apr 1, 2011 at 2:09 PM, Ron Kerry <rkerry at sgi.com> wrote: 
> >  > > > On 7/22/64 2:59 PM, Pavel Levshin wrote: 
> >  > > >> 
> >  > > >> 01.04.2011 18:36, Ron Kerry: 
> >  > > >> > Folks - 
> >  > > >> > 
> >  > > >> > Consider a running cluster with all resources managed. We want to stop 
> >  > > >> > and quickly restart a particular resource without impacting other 
> >  > > >> > resources. The software stack running on the system can deal with this 
> >  > > >> > sort of temporary outage. We perform the following actions: 
> >  > > >> > * unmanage the resource 
> >  > > >> > * stop the resource 
> >  > > >> > * start the resource 
> >  > > >> > * manage the resource 
> >  > > >> > 
> >  > > >> > The above procedure is sometimes successful. However, we will also 
> >  > > >> > sometimes get a resource monitor failure after stopping the resource. 
> >  > > >> > It is clear that the monitor operation was not stopped (at least not 
> >  > > >> > immediately) by unmanaging the resource. 
> >  > > >> 
> >  > > >> Unmanaged resource cannot be started and stopped, but can still be 
> >  > > >> monitored. 
> >  > > > 
> >  > > > So unmanaged really means the resource is still being managed to some 
> >  > > > degree? 
> >  > > 
> >  > > It means that Pacemaker still wants to know its state. What kind of 
> >  > > problem does it create? 
> >  > > 
> >  > 
> >  > An unmanaged resource whoose monitor is still running will cause a  
> monitor 
> >  > failure when the resource 
> >  > is stopped. Pacemaker then takes the 'onfail' action defined for the  
> monitor 
> >  > operation. In other 
> >  > words, the resource is still being managed to some degree. If the monitor 
> >  > operation was still 
> >  > running but no action was taken as a result of the monitor operation 
> >  > outcome, there would be no issue. 
> > 
> > Try "crm configure property maintenance-mode=true". Admittedly this 
> > affects the entire cluster, but it will ensure no starts, stops or 
> > monitors... 
> > 
> > Regards, 
> > 
> > Tim 
>  
> Tim - 
>  
> Thanks, this does work but is rather like using a sledge hammer to do the  
> work of a ball peen  
> hammer. It unmanages ALL resources and stops all the monitor operations. 

Very true.

> How do we go about requesting a change to pacemaker to achieve the desired  
> behavior?

File an enhancement request at:

  http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

> As I see it there are two options: 
>  
>    1. fix 'crm resource unmanage <rsc>' to also stop the individual resource  
> monitor 
>  
> -or- 
>  
>    2. create a 'crm resource maintenance <rsc>' to unmanage and stop the  
> individual resource monitor 

I'd be going for option 2.

Regards,

Tim


-- 
Tim Serong <tserong at novell.com>
Senior Clustering Engineer, OPS Engineering, Novell Inc.







More information about the Pacemaker mailing list