[ClusterLabs] Pacemaker failover failure

Wed Jul 1 14:39:00 UTC 2015

This is what crm_mon shows

Last updated: Wed Jul  1 10:35:40 2015

Last change: Wed Jul  1 09:52:46 2015

Stack: classic openais (with plugin)

Current DC: host2 - partition with quorum

Version: 1.1.11-97629de

2 Nodes configured, 2 expected votes

4 Resources configured

Online: [ host1 host2 ]

ClusterIP (ocf::heartbeat:IPaddr2): Started host2

 Master/Slave Set: redis_clone [redis]

     Masters: [ host2 ]

     Slaves: [ host1 ]

pcmk-fencing    (stonith:fence_pcmk):   Started host2

On Wed, Jul 1, 2015 at 3:37 PM, alex austin <alexixalex at gmail.com> wrote:

> I am running version 1.4.7 of corosync
>
>
>
> On Wed, Jul 1, 2015 at 3:25 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
>
>> On 07/01/2015 08:57 AM, alex austin wrote:
>> > I have now configured stonith-enabled=true. What device should I use for
>> > fencing given the fact that it's a virtual machine but I don't have
>> access
>> > to its configuration. would fence_pcmk do? if so, what parameters
>> should I
>> > configure for it to work properly?
>>
>> No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
>> CMAN to redirect its fencing requests to pacemaker.
>>
>> For a virtual machine, ideally you'd use fence_virtd running on the
>> physical host, but I'm guessing from your comment that you can't do
>> that. Does whoever provides your VM also provide an API for controlling
>> it (starting/stopping/rebooting)?
>>
>> Regarding your original problem, it sounds like the surviving node
>> doesn't have quorum. What version of corosync are you using? If you're
>> using corosync 2, you need "two_node: 1" in corosync.conf, in addition
>> to configuring fencing in pacemaker.
>>
>> > This is my new config:
>> >
>> >
>> > node dcwbpvmuas004.edc.nam.gm.com \
>> >
>> >         attributes standby=off
>> >
>> > node dcwbpvmuas005.edc.nam.gm.com \
>> >
>> >         attributes standby=off
>> >
>> > primitive ClusterIP IPaddr2 \
>> >
>> >         params ip=198.208.86.242 cidr_netmask=23 \
>> >
>> >         op monitor interval=1s timeout=20s \
>> >
>> >         op start interval=0 timeout=20s \
>> >
>> >         op stop interval=0 timeout=20s \
>> >
>> >         meta is-managed=true target-role=Started resource-stickiness=500
>> >
>> > primitive pcmk-fencing stonith:fence_pcmk \
>> >
>> >         params pcmk_host_list="dcwbpvmuas004.edc.nam.gm.com
>> > dcwbpvmuas005.edc.nam.gm.com" \
>> >
>> >         op monitor interval=10s \
>> >
>> >         meta target-role=Started
>> >
>> > primitive redis redis \
>> >
>> >         meta target-role=Master is-managed=true \
>> >
>> >         op monitor interval=1s role=Master timeout=5s on-fail=restart
>> >
>> > ms redis_clone redis \
>> >
>> >         meta notify=true is-managed=true ordered=false interleave=false
>> > globally-unique=false target-role=Master migration-threshold=1
>> >
>> > colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
>> >
>> > colocation ip-on-redis inf: ClusterIP redis_clone:Master
>> >
>> > colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master
>> >
>> > property cib-bootstrap-options: \
>> >
>> >         dc-version=1.1.11-97629de \
>> >
>> >         cluster-infrastructure="classic openais (with plugin)" \
>> >
>> >         expected-quorum-votes=2 \
>> >
>> >         stonith-enabled=true
>> >
>> > property redis_replication: \
>> >
>> >         redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
>> >
>> > On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander <
>> > alexander.nekrasov at emc.com> wrote:
>> >
>> >> stonith-enabled=false
>> >>
>> >> this might be the issue. The way peer node death is resolved, the
>> >> surviving node must call STONITH on the peer. If it’s disabled it
>> might not
>> >> be able to resolve the event
>> >>
>> >>
>> >>
>> >> Alex
>> >>
>> >>
>> >>
>> >> *From:* alex austin [mailto:alexixalex at gmail.com]
>> >> *Sent:* Wednesday, July 01, 2015 9:51 AM
>> >> *To:* Users at clusterlabs.org
>> >> *Subject:* Re: [ClusterLabs] Pacemaker failover failure
>> >>
>> >>
>> >>
>> >> So I noticed that if I kill redis on one node, it starts on the other,
>> no
>> >> problem, but if I actually kill pacemaker itself on one node, the other
>> >> doesn't "sense" it so it doesn't fail over.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Jul 1, 2015 at 12:42 PM, alex austin <alexixalex at gmail.com>
>> wrote:
>> >>
>> >> Hi all,
>> >>
>> >>
>> >>
>> >> I have configured a virtual ip and redis in master-slave with corosync
>> >> pacemaker. If redis fails, then the failover is successful, and redis
>> gets
>> >> promoted on the other node. However if pacemaker itself fails on the
>> active
>> >> node, the failover is not performed. Is there anything I missed in the
>> >> configuration?
>> >>
>> >>
>> >>
>> >> Here's my configuration (i have hashed the ip address out):
>> >>
>> >>
>> >>
>> >> node host1.com
>> >>
>> >> node host2.com
>> >>
>> >> primitive ClusterIP IPaddr2 \
>> >>
>> >> params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
>> >>
>> >> op monitor interval=1s timeout=20s \
>> >>
>> >> op start interval=0 timeout=20s \
>> >>
>> >> op stop interval=0 timeout=20s \
>> >>
>> >> meta is-managed=true target-role=Started resource-stickiness=500
>> >>
>> >> primitive redis redis \
>> >>
>> >> meta target-role=Master is-managed=true \
>> >>
>> >> op monitor interval=1s role=Master timeout=5s on-fail=restart
>> >>
>> >> ms redis_clone redis \
>> >>
>> >> meta notify=true is-managed=true ordered=false interleave=false
>> >> globally-unique=false target-role=Master migration-threshold=1
>> >>
>> >> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
>> >>
>> >> colocation ip-on-redis inf: ClusterIP redis_clone:Master
>> >>
>> >> property cib-bootstrap-options: \
>> >>
>> >> dc-version=1.1.11-97629de \
>> >>
>> >> cluster-infrastructure="classic openais (with plugin)" \
>> >>
>> >> expected-quorum-votes=2 \
>> >>
>> >> stonith-enabled=false
>> >>
>> >> property redis_replication: \
>> >>
>> >> redis_REPL_INFO=host.com
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150701/c79c152f/attachment.htm>