[Pacemaker] Two node cluster and no hardware device for stonith.

Thu Jan 22 04:03:38 EST 2015

On 21.01.2015 11:18 Digimer wrote:
> On 21/01/15 08:13 AM, Andrea wrote:
>> > Hi All,
>> >
>> > I have a question about stonith
>> > In my scenarion , I have to create 2 node cluster, but I don't have any
>> > hardware device for stonith. No APC no IPMI ecc, no one of the list returned
>> > by "pcs stonith list"
>> > So, there is an option to do something?
>> > This is my scenario:
>> > - 2 nodes cluster
>> > serverHA1
>> > serverHA2
>> >
>> > - Software
>> > Centos 6.6
>> > pacemaker.x86_64  1.1.12-4.el6
>> > cman.x86_64       3.0.12.1-68.el6
>> > corosync.x86_64   1.4.7-1.el6
>> >
>> > -NO hardware device for stonith!
>> >
>> > - Cluster creation ([ALL] operation done on all nodes, [ONE] operation done
>> > on only one node)
>> > [ALL] systemctl start pcsd.service
>> > [ALL] systemctl enable pcsd.service
>> > [ONE] pcs cluster auth serverHA1 serverHA2
>> > [ALL] echo "CMAN_QUORUM_TIMEOUT=0" >> /etc/sysconfig/cman
>> > [ONE] pcs cluster setup --name MyCluHA serverHA1 serverHA2
>> > [ONE] pcs property set stonith-enabled=false
>> > [ONE] pcs property set no-quorum-policy=ignore
>> > [ONE] pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000
>> > host_list=192.168.56.1 --clone
>> >
>> >
>> > In my test, when I simulate network failure, split brain occurs, and when
>> > network come back, One node kill the other node
>> > -log on node 1:
>> > Jan 21 11:45:28 corosync [CMAN  ] memb: Sending KILL to node 2
>> >
>> > -log on node 2:
>> > Jan 21 11:45:28 corosync [CMAN  ] memb: got KILL for node 2
>> >
>> >
>> > There is a method to restart pacemaker when network come back instead of
>> > kill it?
>> >
>> > Thanks
>> > Andrea
> You really need a fence device, there isn't a way around it. By
> definition, when a node needs to be fenced, it is in an unknown state
> and it can not be predicted to operate predictably.
>
> If you're using real hardware, then you can use a switched PDU
> (network-connected power bar with individual outlet control) to do
> fencing. I use the APC AP7900 in all my clusters and it works perfectly.
> I know that some other brands work, too.
>
> If you machines are virtual machines, then you can do fencing by talking
> to the hypervisor. In this case, one node calls the host of the other
> node and asks it to be terminated (fence_virsh and fence_xvm for KVM/Xen
> systems, fence_vmware for VMWare, etc).
>
> -- Digimer
If you want to save money and you can solder a bit, I can recommend rcd_serial.
The required device is described in cluster-glue/stonith/README.rcd_serial.
It is very simple but it works for us reliably since more than four years!

Eberhard


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------