[Pacemaker] How to setup STONITH in a 2-node active/passive linux HA pacemaker cluster?

Wed Mar 21 10:57:18 EDT 2012

On 03/21/2012 02:53 PM, Dejan Muhamedagic wrote:
> On Tue, Mar 20, 2012 at 06:22:34PM +0100, Andreas Kurz wrote:
>> On 03/20/2012 04:14 PM, Mathias Nestler wrote:
>>> Hi Dejan,
>>>
>>> On 20.03.2012, at 15:25, Dejan Muhamedagic wrote:
>>>
>>>> Hi,
>>>>
>>>> On Tue, Mar 20, 2012 at 08:52:39AM +0100, Mathias Nestler wrote:
>>>>> On 19.03.2012, at 20:26, Florian Haas wrote:
>>>>>
>>>>>> On Mon, Mar 19, 2012 at 8:14 PM, Mathias Nestler
>>>>>> <mathias.nestler at barzahlen.de <mailto:mathias.nestler at barzahlen.de>>
>>>>>> wrote:
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> I am trying to setup an active/passive (2 nodes) Linux-HA cluster
>>>>>>> with corosync and pacemaker to hold a PostgreSQL-Database up and
>>>>>>> running. It works via DRBD and a service-ip. If node1 fails, node2
>>>>>>> should take over. The same if PG runs on node2 and it fails.
>>>>>>> Everything works fine except the STONITH thing.
>>>>>>>
>>>>>>> Between the nodes is an dedicated HA-connection (10.10.10.X), so I
>>>>>>> have the following interface configuration:
>>>>>>>
>>>>>>> eth0                        eth1                   host
>>>>>>> 10.10.10.251    172.10.10.1     node1
>>>>>>> 10.10.10.252    172.10.10.2     node2
>>>>>>>
>>>>>>> Stonith is enabled and I am testing with a ssh-agent to kill nodes.
>>>>>>>
>>>>>>> crm configure property stonith-enabled=true
>>>>>>> crm configure property stonith-action=poweroff
>>>>>>> crm configure rsc_defaults resource-stickiness=100
>>>>>>> crm configure property no-quorum-policy=ignore
>>>>>>>
>>>>>>> crm configure primitive stonith_postgres stonith:external/ssh \
>>>>>>>              params hostlist="node1 node2"
>>>>>>> crm configure clone fencing_postgres stonith_postgres
>>>>>>
>>>>>> You're missing location constraints, and doing this with 2 primitives
>>>>>> rather than 1 clone is usually cleaner. The example below is for
>>>>>> external/libvirt rather than external/ssh, but you ought to be able to
>>>>>> apply the concept anyhow:
>>>>>>
>>>>>> http://www.hastexo.com/resources/hints-and-kinks/fencing-virtual-cluster-nodes
>>>>>>
>>>>>
>>>>> As is understood the cluster decides which node has to be stonith'ed.
>>>>> Besides this, I already tried the following configuration:
>>>>>
>>>>> crm configure primitive stonith1_postgres stonith:ssh \
>>>>> params hostlist="node1"
>>>>> op monitor interval="25" timeout="10"
>>>>> crm configure primitive stonith2_postgres stonith:ssh \
>>>>> params hostlist="node2"
>>>>> op monitor interval="25" timeout="10"
>>>>> crm configure location stonith1_not_on_node1 stonith1_postgres \
>>>>> -inf: node1
>>>>> crm configure location stonith2_not_on_node2 stonith2_postgres \
>>>>> -inf: node2
>>>>>
>>>>> The result is the same :/
>>>>
>>>> Neither ssh nor external/ssh are supported fencing options. Both
>>>> include a sleep before reboot which makes the window in which
>>>> it's possible for both nodes to fence each other larger than it
>>>> is usually the case with production quality stonith plugins.
>>>
>>> I use this ssh-stonith only for testing. At the moment I am creating the
>>> cluster in a virtual environment. Besides this, what is the difference
>>> between ssh and external/ssh?
>>
>> the first one is a binary implementation, the second one is a simple
>> shell script ... that's it ;-)
>>
>>> My problem is, that each node tries to kill the other. But I only want
>>> to kill the node with the postgres resource on it if connection between
>>> nodes breaks.
>>
>> That is the expected behavior if you introduce a split-brain in a two
>> node cluster. Each node builds its own cluster partition and tries to
>> stonith the other "dead" node.
>>
>> If you are using a virtualization environment managed by libvirt you can
>> follow the link Florian posted. If you are running on some VMware or
>> Virtualbox testing environment using sbd for fencing might be a good
>> option ... as shared storage can be provided easily.
>>
>> Then you could also do a weak colocation of the one sbd stonith agent
>> instance with your postgres instance and in combination with the correct
>> start-timeout you can get the behavior you want.
> 
> /me wonders why is it that the node running postgres is a better
> candidate to be fenced. Collocating a stonith resource with
> whatever other resource doesn't make much sense.

It is not needed, no ... the idea of this was only to give the node
running postgres a good chance to be the first to fence the other node
via sbd in case of a split brain ... can't think of another (useful?)
case where that would make sense atm.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Thanks,
> 
> Dejan
> 
>> Regards,
>> Andreas
>>
>> -- 
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
>>
>>>
>>>>
>>>> As for the configuration, I'd rather use the first one, just not
>>>> cloned. That also helps prevent mutual fencing.
>>>>
>>>
>>> I cloned it because I also want the STONITH-feature if postgres lives on
>>> the other node. How can I achieve it?
>>>
>>>> See also:
>>>>
>>>> http://www.clusterlabs.org/doc/crm_fencing.html
>>>> http://ourobengr.com/ha
>>>>
>>>
>>> Thank you very much
>>>
>>> Best
>>> Mathias
>>>
>>>> Thanks,
>>>>
>>>> Dejan
>>>>
>>>>>> Hope this helps.
>>>>>> Cheers,
>>>>>> Florian
>>>>>>
>>>>>
>>>>> Best
>>>>> Mathias
>>>>>
>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>> <mailto:Pacemaker at oss.clusterlabs.org>
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>> <mailto:Pacemaker at oss.clusterlabs.org>
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
> 
> 
> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 222 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120321/04cfdf4b/attachment-0003.sig>