[Pacemaker] Pacemaker 1.1: cloned stonith resources require --force to be added to levels

Wed Jul 9 12:43:44 UTC 2014

On Tue, Jul 8, 2014, at 06:06, Andrew Beekhof wrote:
> 
> On 5 Jul 2014, at 1:00 am, Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> wrote:
> 
> > From: andrew at beekhof.net
> > Date: Fri, 4 Jul 2014 22:50:28 +1000
> > To: pacemaker at oss.clusterlabs.org
> > Subject: Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require	--force to be added to levels
> > 
> >  
> > On 4 Jul 2014, at 1:29 pm, Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> wrote:
> >  
> > > >> Hi all,
> > > >> while creating a cloned stonith resource
> > > > 
> > > > Any particular reason you feel the need to clone it?
> > >  
> > > In the end, I suppose it's only a "purist mindset" :) because it is a PDU whose power outlets control both nodes, so
> > > its resource "should be" active (and monitored) on both nodes "independently".
> > > I understand that it would work anyway, leaving it not cloned and not location-constrained
> > > just as regular, "dedicated" stonith devices would not need to be location-constrained, right?
> > > 
> > > >> for multi-level STONITH on a fully-up-to-date CentOS 6.5 (pacemaker-1.1.10-14.el6_5.3.x86_64):
> > > >> 
> > > >> pcs cluster cib stonith_cfg
> > > >> pcs -f stonith_cfg stonith create pdu1 fence_apc action="off" \
> > > >>     ipaddr="pdu1.verolengo.privatelan" login="cluster" passwd="test" \    pcmk_host_map="cluster1.verolengo.privatelan:3,cluster1.verolengo.privatelan:4,cluster2.verolengo.privatelan:6,cluster2.verolengo.privatelan:7" \
> > > >>     pcmk_host_check="static-list" pcmk_host_list="cluster1.verolengo.privatelan,cluster2.verolengo.privatelan" op monitor interval="240s"
> > > >> pcs -f stonith_cfg resource clone pdu1 pdu1Clone
> > > >> pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan pdu1Clone
> > > >> pcs -f stonith_cfg stonith level add 2 cluster2.verolengo.privatelan pdu1Clone
> > > >> 
> > > >> 
> > > >> the last 2 lines do not succeed unless I add the option "--force" and even so I still get errors when issuing verify:
> > > >> 
> > > >> [root at cluster1 ~]# pcs stonith level verify
> > > >> Error: pdu1Clone is not a stonith id
> > > > 
> > > > If you check, I think you'll find there is no such resource as 'pdu1Clone'.
> > > > I don't believe pcs lets you decide what the clone name is.
> > > 
> > > You're right! (obviously ;> )
> > > It's been automatically named pdu1-clone
> > > 
> > > I suppose that there's still too much crmsh in my memory :)
> > > 
> > > Anyway, removing the stonith level (to start from scratch) and using the correct clone name does not change the result:
> > > 
> > > [root at cluster1 etc]# pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan pdu1-clone
> > > Error: pdu1-clone is not a stonith id (use --force to override)
> >  
> > I bet we didn't think of that.
> > What if you just do:
> >  
> >    pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan pdu1
> >  
> > Does that work?
> >  
> > ------------------------------------------------------------------------
> > 
> > Yes, no errors at all and verify successful.

This initially passed by as a simple check for general sanity, while now, on second read, I think you were suggesting that I could clone as usual then configure with the primitive resource (which I usually avoid when working with regular clones) and it should automatically use instead the clone "at runtime", correct?

> > Remember that a full real test (to verify actual second level functionality in presence of first level failure)
> > is still pending for both the plain and cloned setup.
> > 
> > Apropos: I read through the list archives that stonith resources (being resources, after all)
> > could themselves cause fencing (!) if failing (start, monitor, stop)
> 
> stop just unsets a flag in stonithd.
> start does perform a monitor op though, which could fail.
> 
> but by default only stop failure would result in fencing.

I though that start-failure-is-fatal was true by default, but maybe not for stonith resources.

> > and that an ad-hoc
> > on-fail setting could be used to prevent that.
> > Maybe my aforementioned naive testing procedure (pull the iLO cable) could provoke that?
> 
> _shouldnt_ do so
> 
> > Would you suggest to configure such an on-fail option?
> 
> again, shouldn't be necessary

Thanks again.

Regards,
Giuseppe

> > Many thanks again for your help (and all your valuable work, of course!).
> > 
> > Regards,
> > Giuseppe
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)
-- 
  Giuseppe Ragusa
  giuseppe.ragusa at fastmail.fm