[Pacemaker] Pacemaker 1.1: cloned stonith resources require --force to be added to levels

Thu Jul 10 19:40:39 EDT 2014

On 10 Jul 2014, at 10:59 am, Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> wrote:

> On Thu, Jul 10, 2014, at 00:00, Andrew Beekhof wrote:
>> 
>> On 9 Jul 2014, at 10:43 pm, Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> wrote:
>> 
>>> On Tue, Jul 8, 2014, at 06:06, Andrew Beekhof wrote:
>>>> 
>>>> On 5 Jul 2014, at 1:00 am, Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> wrote:
>>>> 
>>>>> From: andrew at beekhof.net
>>>>> Date: Fri, 4 Jul 2014 22:50:28 +1000
>>>>> To: pacemaker at oss.clusterlabs.org
>>>>> Subject: Re: [Pacemaker] Pacemaker 1.1: cloned stonith resources require	--force to be added to levels
>>>>> 
>>>>> 
>>>>> On 4 Jul 2014, at 1:29 pm, Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> wrote:
>>>>> 
>>>>>>>> Hi all,
>>>>>>>> while creating a cloned stonith resource
>>>>>>> 
>>>>>>> Any particular reason you feel the need to clone it?
>>>>>> 
>>>>>> In the end, I suppose it's only a "purist mindset" :) because it is a PDU whose power outlets control both nodes, so
>>>>>> its resource "should be" active (and monitored) on both nodes "independently".
>>>>>> I understand that it would work anyway, leaving it not cloned and not location-constrained
>>>>>> just as regular, "dedicated" stonith devices would not need to be location-constrained, right?
>>>>>> 
>>>>>>>> for multi-level STONITH on a fully-up-to-date CentOS 6.5 (pacemaker-1.1.10-14.el6_5.3.x86_64):
>>>>>>>> 
>>>>>>>> pcs cluster cib stonith_cfg
>>>>>>>> pcs -f stonith_cfg stonith create pdu1 fence_apc action="off" \
>>>>>>>>   ipaddr="pdu1.verolengo.privatelan" login="cluster" passwd="test" \    pcmk_host_map="cluster1.verolengo.privatelan:3,cluster1.verolengo.privatelan:4,cluster2.verolengo.privatelan:6,cluster2.verolengo.privatelan:7" \
>>>>>>>>   pcmk_host_check="static-list" pcmk_host_list="cluster1.verolengo.privatelan,cluster2.verolengo.privatelan" op monitor interval="240s"
>>>>>>>> pcs -f stonith_cfg resource clone pdu1 pdu1Clone
>>>>>>>> pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan pdu1Clone
>>>>>>>> pcs -f stonith_cfg stonith level add 2 cluster2.verolengo.privatelan pdu1Clone
>>>>>>>> 
>>>>>>>> 
>>>>>>>> the last 2 lines do not succeed unless I add the option "--force" and even so I still get errors when issuing verify:
>>>>>>>> 
>>>>>>>> [root at cluster1 ~]# pcs stonith level verify
>>>>>>>> Error: pdu1Clone is not a stonith id
>>>>>>> 
>>>>>>> If you check, I think you'll find there is no such resource as 'pdu1Clone'.
>>>>>>> I don't believe pcs lets you decide what the clone name is.
>>>>>> 
>>>>>> You're right! (obviously ;> )
>>>>>> It's been automatically named pdu1-clone
>>>>>> 
>>>>>> I suppose that there's still too much crmsh in my memory :)
>>>>>> 
>>>>>> Anyway, removing the stonith level (to start from scratch) and using the correct clone name does not change the result:
>>>>>> 
>>>>>> [root at cluster1 etc]# pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan pdu1-clone
>>>>>> Error: pdu1-clone is not a stonith id (use --force to override)
>>>>> 
>>>>> I bet we didn't think of that.
>>>>> What if you just do:
>>>>> 
>>>>>  pcs -f stonith_cfg stonith level add 2 cluster1.verolengo.privatelan pdu1
>>>>> 
>>>>> Does that work?
>>>>> 
>>>>> ------------------------------------------------------------------------
>>>>> 
>>>>> Yes, no errors at all and verify successful.
>>> 
>>> This initially passed by as a simple check for general sanity, while now, on second read, I think you were suggesting that I could clone as usual then configure with the primitive resource (which I usually avoid when working with regular clones) and it should automatically use instead the clone "at runtime", correct?
>> 
>> right. but also consider not cloning it at all :)
> 
> I understand that in your opinion there's almost no added value to cloned stonith resources, so I suppose that should a PDU-type resource happen to be running on the same node that it must now fence, it would be migrated first or something like that (since I understand that stonith resources cannot fence the node they are running on), right?

Nope. There is no requirement that the fencing resource first be running a) anywhere or b) on a different node in order for a node to be fenced.
Where-ever possible we will try to avoid having a node fence itself, but this is unrelated to where the fencing resource is running. 

> If it is so and there's no adverse effect whatsoever (not even a significant delay), I will promptly remove the clone and configure my second levels using the primitive PDU stonith resource, but if on the contrary you after all think that there could be some "legitimate" use for such clones, I could open an RFE in bugzilla for them to be recognized as stonith resources and used in forming levels (if you suggest so).
> 
> Anyway,  many thanks for you advice and insight, obviously :)
> 
>>>>> Remember that a full real test (to verify actual second level functionality in presence of first level failure)
>>>>> is still pending for both the plain and cloned setup.
>>>>> 
>>>>> Apropos: I read through the list archives that stonith resources (being resources, after all)
>>>>> could themselves cause fencing (!) if failing (start, monitor, stop)
>>>> 
>>>> stop just unsets a flag in stonithd.
>>>> start does perform a monitor op though, which could fail.
>>>> 
>>>> but by default only stop failure would result in fencing.
>>> 
>>> I though that start-failure-is-fatal was true by default, but maybe not for stonith resources.
>> 
>> fatal in the sense of "won't attempt to run it there again", not the "fence the whole node" way
> 
> Ah right, I remember now all the suggestions I found about migration-threshold, failure-timeout and the cluster-recheck-interval... sorry for the confusion and thank you for pointing it out!
> 
> Regards,
> Giuseppe
> 
>>>>> and that an ad-hoc
>>>>> on-fail setting could be used to prevent that.
>>>>> Maybe my aforementioned naive testing procedure (pull the iLO cable) could provoke that?
>>>> 
>>>> _shouldnt_ do so
>>>> 
>>>>> Would you suggest to configure such an on-fail option?
>>>> 
>>>> again, shouldn't be necessary
>>> 
>>> Thanks again.
>>> 
>>> Regards,
>>> Giuseppe
>>> 
>>>>> Many thanks again for your help (and all your valuable work, of course!).
>>>>> 
>>>>> Regards,
>>>>> Giuseppe
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>> Email had 1 attachment:
>>>> + signature.asc
>>>> 1k (application/pgp-signature)
>>> -- 
>>> Giuseppe Ragusa
>>> giuseppe.ragusa at fastmail.fm
>>> 
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> Email had 1 attachment:
>> + signature.asc
>>  1k (application/pgp-signature)
> -- 
>  Giuseppe Ragusa
>  giuseppe.ragusa at fastmail.fm
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140711/bd241467/attachment-0003.sig>