[Pacemaker] How to setup STONITH in a 2-node active/passive linux HA pacemaker cluster?

Wed Mar 21 09:53:39 EDT 2012

On Tue, Mar 20, 2012 at 06:22:34PM +0100, Andreas Kurz wrote:
> On 03/20/2012 04:14 PM, Mathias Nestler wrote:
> > Hi Dejan,
> > 
> > On 20.03.2012, at 15:25, Dejan Muhamedagic wrote:
> > 
> >> Hi,
> >>
> >> On Tue, Mar 20, 2012 at 08:52:39AM +0100, Mathias Nestler wrote:
> >>> On 19.03.2012, at 20:26, Florian Haas wrote:
> >>>
> >>>> On Mon, Mar 19, 2012 at 8:14 PM, Mathias Nestler
> >>>> <mathias.nestler at barzahlen.de <mailto:mathias.nestler at barzahlen.de>>
> >>>> wrote:
> >>>>> Hi everyone,
> >>>>>
> >>>>> I am trying to setup an active/passive (2 nodes) Linux-HA cluster
> >>>>> with corosync and pacemaker to hold a PostgreSQL-Database up and
> >>>>> running. It works via DRBD and a service-ip. If node1 fails, node2
> >>>>> should take over. The same if PG runs on node2 and it fails.
> >>>>> Everything works fine except the STONITH thing.
> >>>>>
> >>>>> Between the nodes is an dedicated HA-connection (10.10.10.X), so I
> >>>>> have the following interface configuration:
> >>>>>
> >>>>> eth0                        eth1                   host
> >>>>> 10.10.10.251    172.10.10.1     node1
> >>>>> 10.10.10.252    172.10.10.2     node2
> >>>>>
> >>>>> Stonith is enabled and I am testing with a ssh-agent to kill nodes.
> >>>>>
> >>>>> crm configure property stonith-enabled=true
> >>>>> crm configure property stonith-action=poweroff
> >>>>> crm configure rsc_defaults resource-stickiness=100
> >>>>> crm configure property no-quorum-policy=ignore
> >>>>>
> >>>>> crm configure primitive stonith_postgres stonith:external/ssh \
> >>>>>              params hostlist="node1 node2"
> >>>>> crm configure clone fencing_postgres stonith_postgres
> >>>>
> >>>> You're missing location constraints, and doing this with 2 primitives
> >>>> rather than 1 clone is usually cleaner. The example below is for
> >>>> external/libvirt rather than external/ssh, but you ought to be able to
> >>>> apply the concept anyhow:
> >>>>
> >>>> http://www.hastexo.com/resources/hints-and-kinks/fencing-virtual-cluster-nodes
> >>>>
> >>>
> >>> As is understood the cluster decides which node has to be stonith'ed.
> >>> Besides this, I already tried the following configuration:
> >>>
> >>> crm configure primitive stonith1_postgres stonith:ssh \
> >>> params hostlist="node1"
> >>> op monitor interval="25" timeout="10"
> >>> crm configure primitive stonith2_postgres stonith:ssh \
> >>> params hostlist="node2"
> >>> op monitor interval="25" timeout="10"
> >>> crm configure location stonith1_not_on_node1 stonith1_postgres \
> >>> -inf: node1
> >>> crm configure location stonith2_not_on_node2 stonith2_postgres \
> >>> -inf: node2
> >>>
> >>> The result is the same :/
> >>
> >> Neither ssh nor external/ssh are supported fencing options. Both
> >> include a sleep before reboot which makes the window in which
> >> it's possible for both nodes to fence each other larger than it
> >> is usually the case with production quality stonith plugins.
> > 
> > I use this ssh-stonith only for testing. At the moment I am creating the
> > cluster in a virtual environment. Besides this, what is the difference
> > between ssh and external/ssh?
> 
> the first one is a binary implementation, the second one is a simple
> shell script ... that's it ;-)
> 
> > My problem is, that each node tries to kill the other. But I only want
> > to kill the node with the postgres resource on it if connection between
> > nodes breaks.
> 
> That is the expected behavior if you introduce a split-brain in a two
> node cluster. Each node builds its own cluster partition and tries to
> stonith the other "dead" node.
> 
> If you are using a virtualization environment managed by libvirt you can
> follow the link Florian posted. If you are running on some VMware or
> Virtualbox testing environment using sbd for fencing might be a good
> option ... as shared storage can be provided easily.
> 
> Then you could also do a weak colocation of the one sbd stonith agent
> instance with your postgres instance and in combination with the correct
> start-timeout you can get the behavior you want.

/me wonders why is it that the node running postgres is a better
candidate to be fenced. Collocating a stonith resource with
whatever other resource doesn't make much sense.

Thanks,

Dejan

> Regards,
> Andreas
> 
> -- 
> Need help with Pacemaker?
> http://www.hastexo.com/now
> 
> > 
> >>
> >> As for the configuration, I'd rather use the first one, just not
> >> cloned. That also helps prevent mutual fencing.
> >>
> > 
> > I cloned it because I also want the STONITH-feature if postgres lives on
> > the other node. How can I achieve it?
> > 
> >> See also:
> >>
> >> http://www.clusterlabs.org/doc/crm_fencing.html
> >> http://ourobengr.com/ha
> >>
> > 
> > Thank you very much
> > 
> > Best
> > Mathias
> > 
> >> Thanks,
> >>
> >> Dejan
> >>
> >>>> Hope this helps.
> >>>> Cheers,
> >>>> Florian
> >>>>
> >>>
> >>> Best
> >>> Mathias
> >>>
> >>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>> <mailto:Pacemaker at oss.clusterlabs.org>
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> <mailto:Pacemaker at oss.clusterlabs.org>
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> > 
> > 
> > 
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org