[ClusterLabs] Pacemaker failover failure

Thu Jul 2 04:04:26 EDT 2015

Thank you!

However, what is proper fencing in this situation?

Kind Regards,

Alex

On Wed, Jul 1, 2015 at 11:30 PM, Ken Gaillot <kgaillot at redhat.com> wrote:

> On 07/01/2015 09:39 AM, alex austin wrote:
> > This is what crm_mon shows
> >
> >
> > Last updated: Wed Jul  1 10:35:40 2015
> >
> > Last change: Wed Jul  1 09:52:46 2015
> >
> > Stack: classic openais (with plugin)
> >
> > Current DC: host2 - partition with quorum
> >
> > Version: 1.1.11-97629de
> >
> > 2 Nodes configured, 2 expected votes
> >
> > 4 Resources configured
> >
> >
> >
> > Online: [ host1 host2 ]
> >
> >
> > ClusterIP (ocf::heartbeat:IPaddr2): Started host2
> >
> >  Master/Slave Set: redis_clone [redis]
> >
> >      Masters: [ host2 ]
> >
> >      Slaves: [ host1 ]
> >
> > pcmk-fencing    (stonith:fence_pcmk):   Started host2
> >
> > On Wed, Jul 1, 2015 at 3:37 PM, alex austin <alexixalex at gmail.com>
> wrote:
> >
> >> I am running version 1.4.7 of corosync
>
> If you can't upgrade to corosync 2 (which has many improvements), you'll
> need to set the no-quorum-policy=ignore cluster option.
>
> Proper fencing is necessary to avoid a split-brain situation, which can
> corrupt your data.
>
> >> On Wed, Jul 1, 2015 at 3:25 PM, Ken Gaillot <kgaillot at redhat.com>
> wrote:
> >>
> >>> On 07/01/2015 08:57 AM, alex austin wrote:
> >>>> I have now configured stonith-enabled=true. What device should I use
> for
> >>>> fencing given the fact that it's a virtual machine but I don't have
> >>> access
> >>>> to its configuration. would fence_pcmk do? if so, what parameters
> >>> should I
> >>>> configure for it to work properly?
> >>>
> >>> No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
> >>> CMAN to redirect its fencing requests to pacemaker.
> >>>
> >>> For a virtual machine, ideally you'd use fence_virtd running on the
> >>> physical host, but I'm guessing from your comment that you can't do
> >>> that. Does whoever provides your VM also provide an API for controlling
> >>> it (starting/stopping/rebooting)?
> >>>
> >>> Regarding your original problem, it sounds like the surviving node
> >>> doesn't have quorum. What version of corosync are you using? If you're
> >>> using corosync 2, you need "two_node: 1" in corosync.conf, in addition
> >>> to configuring fencing in pacemaker.
> >>>
> >>>> This is my new config:
> >>>>
> >>>>
> >>>> node dcwbpvmuas004.edc.nam.gm.com \
> >>>>
> >>>>         attributes standby=off
> >>>>
> >>>> node dcwbpvmuas005.edc.nam.gm.com \
> >>>>
> >>>>         attributes standby=off
> >>>>
> >>>> primitive ClusterIP IPaddr2 \
> >>>>
> >>>>         params ip=198.208.86.242 cidr_netmask=23 \
> >>>>
> >>>>         op monitor interval=1s timeout=20s \
> >>>>
> >>>>         op start interval=0 timeout=20s \
> >>>>
> >>>>         op stop interval=0 timeout=20s \
> >>>>
> >>>>         meta is-managed=true target-role=Started
> resource-stickiness=500
> >>>>
> >>>> primitive pcmk-fencing stonith:fence_pcmk \
> >>>>
> >>>>         params pcmk_host_list="dcwbpvmuas004.edc.nam.gm.com
> >>>> dcwbpvmuas005.edc.nam.gm.com" \
> >>>>
> >>>>         op monitor interval=10s \
> >>>>
> >>>>         meta target-role=Started
> >>>>
> >>>> primitive redis redis \
> >>>>
> >>>>         meta target-role=Master is-managed=true \
> >>>>
> >>>>         op monitor interval=1s role=Master timeout=5s on-fail=restart
> >>>>
> >>>> ms redis_clone redis \
> >>>>
> >>>>         meta notify=true is-managed=true ordered=false
> interleave=false
> >>>> globally-unique=false target-role=Master migration-threshold=1
> >>>>
> >>>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
> >>>>
> >>>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
> >>>>
> >>>> colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master
> >>>>
> >>>> property cib-bootstrap-options: \
> >>>>
> >>>>         dc-version=1.1.11-97629de \
> >>>>
> >>>>         cluster-infrastructure="classic openais (with plugin)" \
> >>>>
> >>>>         expected-quorum-votes=2 \
> >>>>
> >>>>         stonith-enabled=true
> >>>>
> >>>> property redis_replication: \
> >>>>
> >>>>         redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
> >>>>
> >>>> On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander <
> >>>> alexander.nekrasov at emc.com> wrote:
> >>>>
> >>>>> stonith-enabled=false
> >>>>>
> >>>>> this might be the issue. The way peer node death is resolved, the
> >>>>> surviving node must call STONITH on the peer. If it’s disabled it
> >>> might not
> >>>>> be able to resolve the event
> >>>>>
> >>>>>
> >>>>>
> >>>>> Alex
> >>>>>
> >>>>>
> >>>>>
> >>>>> *From:* alex austin [mailto:alexixalex at gmail.com]
> >>>>> *Sent:* Wednesday, July 01, 2015 9:51 AM
> >>>>> *To:* Users at clusterlabs.org
> >>>>> *Subject:* Re: [ClusterLabs] Pacemaker failover failure
> >>>>>
> >>>>>
> >>>>>
> >>>>> So I noticed that if I kill redis on one node, it starts on the
> other,
> >>> no
> >>>>> problem, but if I actually kill pacemaker itself on one node, the
> other
> >>>>> doesn't "sense" it so it doesn't fail over.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Jul 1, 2015 at 12:42 PM, alex austin <alexixalex at gmail.com>
> >>> wrote:
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>>
> >>>>>
> >>>>> I have configured a virtual ip and redis in master-slave with
> corosync
> >>>>> pacemaker. If redis fails, then the failover is successful, and redis
> >>> gets
> >>>>> promoted on the other node. However if pacemaker itself fails on the
> >>> active
> >>>>> node, the failover is not performed. Is there anything I missed in
> the
> >>>>> configuration?
> >>>>>
> >>>>>
> >>>>>
> >>>>> Here's my configuration (i have hashed the ip address out):
> >>>>>
> >>>>>
> >>>>>
> >>>>> node host1.com
> >>>>>
> >>>>> node host2.com
> >>>>>
> >>>>> primitive ClusterIP IPaddr2 \
> >>>>>
> >>>>> params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
> >>>>>
> >>>>> op monitor interval=1s timeout=20s \
> >>>>>
> >>>>> op start interval=0 timeout=20s \
> >>>>>
> >>>>> op stop interval=0 timeout=20s \
> >>>>>
> >>>>> meta is-managed=true target-role=Started resource-stickiness=500
> >>>>>
> >>>>> primitive redis redis \
> >>>>>
> >>>>> meta target-role=Master is-managed=true \
> >>>>>
> >>>>> op monitor interval=1s role=Master timeout=5s on-fail=restart
> >>>>>
> >>>>> ms redis_clone redis \
> >>>>>
> >>>>> meta notify=true is-managed=true ordered=false interleave=false
> >>>>> globally-unique=false target-role=Master migration-threshold=1
> >>>>>
> >>>>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
> >>>>>
> >>>>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
> >>>>>
> >>>>> property cib-bootstrap-options: \
> >>>>>
> >>>>> dc-version=1.1.11-97629de \
> >>>>>
> >>>>> cluster-infrastructure="classic openais (with plugin)" \
> >>>>>
> >>>>> expected-quorum-votes=2 \
> >>>>>
> >>>>> stonith-enabled=false
> >>>>>
> >>>>> property redis_replication: \
> >>>>>
> >>>>> redis_REPL_INFO=host.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20150702/4dc1e2df/attachment-0003.html>