[Pacemaker] How to setup STONITH in a 2-node active/passive linux HA pacemaker cluster?

Mathias Nestler mathias.nestler at barzahlen.de
Mon Mar 19 15:14:28 EDT 2012


Hi everyone,

I am trying to setup an active/passive (2 nodes) Linux-HA cluster with corosync and pacemaker to hold a PostgreSQL-Database up and running. It works via DRBD and a service-ip. If node1 fails, node2 should take over. The same if PG runs on node2 and it fails. Everything works fine except the STONITH thing.

Between the nodes is an dedicated HA-connection (10.10.10.X), so I have the following interface configuration:

eth0            	    eth1                   host
10.10.10.251    172.10.10.1     node1
10.10.10.252    172.10.10.2     node2

Stonith is enabled and I am testing with a ssh-agent to kill nodes.

crm configure property stonith-enabled=true
crm configure property stonith-action=poweroff
crm configure rsc_defaults resource-stickiness=100
crm configure property no-quorum-policy=ignore

crm configure primitive stonith_postgres stonith:external/ssh \
               params hostlist="node1 node2"
crm configure clone fencing_postgres stonith_postgres

crm_mon -1 shows:

============
Last updated: Mon Mar 19 15:21:11 2012
Stack: openais
Current DC: node2 - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
4 Resources configured.
============

Online: [ node2 node1 ]

Full list of resources:

Master/Slave Set: ms_drbd_postgres
    Masters: [ node1 ]
    Slaves: [ node2 ]
Resource Group: postgres
    fs_postgres        (ocf::heartbeat:Filesystem):    Started node1
    virtual_ip_postgres        (ocf::heartbeat:IPaddr2):       Started node1
    postgresql (ocf::heartbeat:pgsql): Started node1
Clone Set: fencing_postgres
    Started: [ node2 node1 ]

Problem is: when I cut the connection between the eth0-interfaces, it kills both nodes. I think it is a problem with the quorum, because there are just 2 nodes. But I don't want to add a 3rd node just for calculation of the right quorum.

Are there any ideas to solve this problem?


Best
Mathias



More information about the Pacemaker mailing list