[ClusterLabs] Pacemaker failover failure

Wed Jul 1 14:25:34 UTC 2015

On 07/01/2015 08:57 AM, alex austin wrote:
> I have now configured stonith-enabled=true. What device should I use for
> fencing given the fact that it's a virtual machine but I don't have access
> to its configuration. would fence_pcmk do? if so, what parameters should I
> configure for it to work properly?

No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
CMAN to redirect its fencing requests to pacemaker.

For a virtual machine, ideally you'd use fence_virtd running on the
physical host, but I'm guessing from your comment that you can't do
that. Does whoever provides your VM also provide an API for controlling
it (starting/stopping/rebooting)?

Regarding your original problem, it sounds like the surviving node
doesn't have quorum. What version of corosync are you using? If you're
using corosync 2, you need "two_node: 1" in corosync.conf, in addition
to configuring fencing in pacemaker.

> This is my new config:
> 
> 
> node dcwbpvmuas004.edc.nam.gm.com \
> 
>         attributes standby=off
> 
> node dcwbpvmuas005.edc.nam.gm.com \
> 
>         attributes standby=off
> 
> primitive ClusterIP IPaddr2 \
> 
>         params ip=198.208.86.242 cidr_netmask=23 \
> 
>         op monitor interval=1s timeout=20s \
> 
>         op start interval=0 timeout=20s \
> 
>         op stop interval=0 timeout=20s \
> 
>         meta is-managed=true target-role=Started resource-stickiness=500
> 
> primitive pcmk-fencing stonith:fence_pcmk \
> 
>         params pcmk_host_list="dcwbpvmuas004.edc.nam.gm.com
> dcwbpvmuas005.edc.nam.gm.com" \
> 
>         op monitor interval=10s \
> 
>         meta target-role=Started
> 
> primitive redis redis \
> 
>         meta target-role=Master is-managed=true \
> 
>         op monitor interval=1s role=Master timeout=5s on-fail=restart
> 
> ms redis_clone redis \
> 
>         meta notify=true is-managed=true ordered=false interleave=false
> globally-unique=false target-role=Master migration-threshold=1
> 
> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
> 
> colocation ip-on-redis inf: ClusterIP redis_clone:Master
> 
> colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master
> 
> property cib-bootstrap-options: \
> 
>         dc-version=1.1.11-97629de \
> 
>         cluster-infrastructure="classic openais (with plugin)" \
> 
>         expected-quorum-votes=2 \
> 
>         stonith-enabled=true
> 
> property redis_replication: \
> 
>         redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
> 
> On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander <
> alexander.nekrasov at emc.com> wrote:
> 
>> stonith-enabled=false
>>
>> this might be the issue. The way peer node death is resolved, the
>> surviving node must call STONITH on the peer. If it’s disabled it might not
>> be able to resolve the event
>>
>>
>>
>> Alex
>>
>>
>>
>> *From:* alex austin [mailto:alexixalex at gmail.com]
>> *Sent:* Wednesday, July 01, 2015 9:51 AM
>> *To:* Users at clusterlabs.org
>> *Subject:* Re: [ClusterLabs] Pacemaker failover failure
>>
>>
>>
>> So I noticed that if I kill redis on one node, it starts on the other, no
>> problem, but if I actually kill pacemaker itself on one node, the other
>> doesn't "sense" it so it doesn't fail over.
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Jul 1, 2015 at 12:42 PM, alex austin <alexixalex at gmail.com> wrote:
>>
>> Hi all,
>>
>>
>>
>> I have configured a virtual ip and redis in master-slave with corosync
>> pacemaker. If redis fails, then the failover is successful, and redis gets
>> promoted on the other node. However if pacemaker itself fails on the active
>> node, the failover is not performed. Is there anything I missed in the
>> configuration?
>>
>>
>>
>> Here's my configuration (i have hashed the ip address out):
>>
>>
>>
>> node host1.com
>>
>> node host2.com
>>
>> primitive ClusterIP IPaddr2 \
>>
>> params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
>>
>> op monitor interval=1s timeout=20s \
>>
>> op start interval=0 timeout=20s \
>>
>> op stop interval=0 timeout=20s \
>>
>> meta is-managed=true target-role=Started resource-stickiness=500
>>
>> primitive redis redis \
>>
>> meta target-role=Master is-managed=true \
>>
>> op monitor interval=1s role=Master timeout=5s on-fail=restart
>>
>> ms redis_clone redis \
>>
>> meta notify=true is-managed=true ordered=false interleave=false
>> globally-unique=false target-role=Master migration-threshold=1
>>
>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
>>
>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
>>
>> property cib-bootstrap-options: \
>>
>> dc-version=1.1.11-97629de \
>>
>> cluster-infrastructure="classic openais (with plugin)" \
>>
>> expected-quorum-votes=2 \
>>
>> stonith-enabled=false
>>
>> property redis_replication: \
>>
>> redis_REPL_INFO=host.com