[ClusterLabs] Pacemaker failover failure

Thu Jul 2 08:43:27 EDT 2015

Fencing is hardware dependent. Typically, fence_ipmilan if your nodes
have IPMI and/or switched PDUs, like the APC AP7900 (which would use the
fence_apc_snmp agent).

If you aren't sure, tell us what hardware you have.

On 02/07/15 04:04 AM, alex austin wrote:
> Thank you!
> 
> However, what is proper fencing in this situation?
> 
> Kind Regards,
> 
> Alex
> 
> On Wed, Jul 1, 2015 at 11:30 PM, Ken Gaillot <kgaillot at redhat.com
> <mailto:kgaillot at redhat.com>> wrote:
> 
>     On 07/01/2015 09:39 AM, alex austin wrote:
>     > This is what crm_mon shows
>     >
>     >
>     > Last updated: Wed Jul  1 10:35:40 2015
>     >
>     > Last change: Wed Jul  1 09:52:46 2015
>     >
>     > Stack: classic openais (with plugin)
>     >
>     > Current DC: host2 - partition with quorum
>     >
>     > Version: 1.1.11-97629de
>     >
>     > 2 Nodes configured, 2 expected votes
>     >
>     > 4 Resources configured
>     >
>     >
>     >
>     > Online: [ host1 host2 ]
>     >
>     >
>     > ClusterIP (ocf::heartbeat:IPaddr2): Started host2
>     >
>     >  Master/Slave Set: redis_clone [redis]
>     >
>     >      Masters: [ host2 ]
>     >
>     >      Slaves: [ host1 ]
>     >
>     > pcmk-fencing    (stonith:fence_pcmk):   Started host2
>     >
>     > On Wed, Jul 1, 2015 at 3:37 PM, alex austin <alexixalex at gmail.com <mailto:alexixalex at gmail.com>> wrote:
>     >
>     >> I am running version 1.4.7 of corosync
> 
>     If you can't upgrade to corosync 2 (which has many improvements), you'll
>     need to set the no-quorum-policy=ignore cluster option.
> 
>     Proper fencing is necessary to avoid a split-brain situation, which can
>     corrupt your data.
> 
>     >> On Wed, Jul 1, 2015 at 3:25 PM, Ken Gaillot <kgaillot at redhat.com
>     <mailto:kgaillot at redhat.com>> wrote:
>     >>
>     >>> On 07/01/2015 08:57 AM, alex austin wrote:
>     >>>> I have now configured stonith-enabled=true. What device should
>     I use for
>     >>>> fencing given the fact that it's a virtual machine but I don't have
>     >>> access
>     >>>> to its configuration. would fence_pcmk do? if so, what parameters
>     >>> should I
>     >>>> configure for it to work properly?
>     >>>
>     >>> No, fence_pcmk is not for using in pacemaker, but for using in
>     RHEL6's
>     >>> CMAN to redirect its fencing requests to pacemaker.
>     >>>
>     >>> For a virtual machine, ideally you'd use fence_virtd running on the
>     >>> physical host, but I'm guessing from your comment that you can't do
>     >>> that. Does whoever provides your VM also provide an API for
>     controlling
>     >>> it (starting/stopping/rebooting)?
>     >>>
>     >>> Regarding your original problem, it sounds like the surviving node
>     >>> doesn't have quorum. What version of corosync are you using? If
>     you're
>     >>> using corosync 2, you need "two_node: 1" in corosync.conf, in
>     addition
>     >>> to configuring fencing in pacemaker.
>     >>>
>     >>>> This is my new config:
>     >>>>
>     >>>>
>     >>>> node dcwbpvmuas004.edc.nam.gm.com
>     <http://dcwbpvmuas004.edc.nam.gm.com> \
>     >>>>
>     >>>>         attributes standby=off
>     >>>>
>     >>>> node dcwbpvmuas005.edc.nam.gm.com
>     <http://dcwbpvmuas005.edc.nam.gm.com> \
>     >>>>
>     >>>>         attributes standby=off
>     >>>>
>     >>>> primitive ClusterIP IPaddr2 \
>     >>>>
>     >>>>         params ip=198.208.86.242 cidr_netmask=23 \
>     >>>>
>     >>>>         op monitor interval=1s timeout=20s \
>     >>>>
>     >>>>         op start interval=0 timeout=20s \
>     >>>>
>     >>>>         op stop interval=0 timeout=20s \
>     >>>>
>     >>>>         meta is-managed=true target-role=Started
>     resource-stickiness=500
>     >>>>
>     >>>> primitive pcmk-fencing stonith:fence_pcmk \
>     >>>>
>     >>>>         params pcmk_host_list="dcwbpvmuas004.edc.nam.gm.com
>     <http://dcwbpvmuas004.edc.nam.gm.com>
>     >>>> dcwbpvmuas005.edc.nam.gm.com
>     <http://dcwbpvmuas005.edc.nam.gm.com>" \
>     >>>>
>     >>>>         op monitor interval=10s \
>     >>>>
>     >>>>         meta target-role=Started
>     >>>>
>     >>>> primitive redis redis \
>     >>>>
>     >>>>         meta target-role=Master is-managed=true \
>     >>>>
>     >>>>         op monitor interval=1s role=Master timeout=5s
>     on-fail=restart
>     >>>>
>     >>>> ms redis_clone redis \
>     >>>>
>     >>>>         meta notify=true is-managed=true ordered=false
>     interleave=false
>     >>>> globally-unique=false target-role=Master migration-threshold=1
>     >>>>
>     >>>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
>     >>>>
>     >>>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
>     >>>>
>     >>>> colocation pcmk-fencing-on-redis inf: pcmk-fencing
>     redis_clone:Master
>     >>>>
>     >>>> property cib-bootstrap-options: \
>     >>>>
>     >>>>         dc-version=1.1.11-97629de \
>     >>>>
>     >>>>         cluster-infrastructure="classic openais (with plugin)" \
>     >>>>
>     >>>>         expected-quorum-votes=2 \
>     >>>>
>     >>>>         stonith-enabled=true
>     >>>>
>     >>>> property redis_replication: \
>     >>>>
>     >>>>         redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
>     <http://dcwbpvmuas005.edc.nam.gm.com>
>     >>>>
>     >>>> On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander <
>     >>>> alexander.nekrasov at emc.com <mailto:alexander.nekrasov at emc.com>>
>     wrote:
>     >>>>
>     >>>>> stonith-enabled=false
>     >>>>>
>     >>>>> this might be the issue. The way peer node death is resolved, the
>     >>>>> surviving node must call STONITH on the peer. If it’s disabled it
>     >>> might not
>     >>>>> be able to resolve the event
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>> Alex
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>> *From:* alex austin [mailto:alexixalex at gmail.com
>     <mailto:alexixalex at gmail.com>]
>     >>>>> *Sent:* Wednesday, July 01, 2015 9:51 AM
>     >>>>> *To:* Users at clusterlabs.org <mailto:Users at clusterlabs.org>
>     >>>>> *Subject:* Re: [ClusterLabs] Pacemaker failover failure
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>> So I noticed that if I kill redis on one node, it starts on
>     the other,
>     >>> no
>     >>>>> problem, but if I actually kill pacemaker itself on one node,
>     the other
>     >>>>> doesn't "sense" it so it doesn't fail over.
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>> On Wed, Jul 1, 2015 at 12:42 PM, alex austin
>     <alexixalex at gmail.com <mailto:alexixalex at gmail.com>>
>     >>> wrote:
>     >>>>>
>     >>>>> Hi all,
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>> I have configured a virtual ip and redis in master-slave with
>     corosync
>     >>>>> pacemaker. If redis fails, then the failover is successful,
>     and redis
>     >>> gets
>     >>>>> promoted on the other node. However if pacemaker itself fails
>     on the
>     >>> active
>     >>>>> node, the failover is not performed. Is there anything I
>     missed in the
>     >>>>> configuration?
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>> Here's my configuration (i have hashed the ip address out):
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>> node host1.com <http://host1.com>
>     >>>>>
>     >>>>> node host2.com <http://host2.com>
>     >>>>>
>     >>>>> primitive ClusterIP IPaddr2 \
>     >>>>>
>     >>>>> params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
>     >>>>>
>     >>>>> op monitor interval=1s timeout=20s \
>     >>>>>
>     >>>>> op start interval=0 timeout=20s \
>     >>>>>
>     >>>>> op stop interval=0 timeout=20s \
>     >>>>>
>     >>>>> meta is-managed=true target-role=Started resource-stickiness=500
>     >>>>>
>     >>>>> primitive redis redis \
>     >>>>>
>     >>>>> meta target-role=Master is-managed=true \
>     >>>>>
>     >>>>> op monitor interval=1s role=Master timeout=5s on-fail=restart
>     >>>>>
>     >>>>> ms redis_clone redis \
>     >>>>>
>     >>>>> meta notify=true is-managed=true ordered=false interleave=false
>     >>>>> globally-unique=false target-role=Master migration-threshold=1
>     >>>>>
>     >>>>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
>     >>>>>
>     >>>>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
>     >>>>>
>     >>>>> property cib-bootstrap-options: \
>     >>>>>
>     >>>>> dc-version=1.1.11-97629de \
>     >>>>>
>     >>>>> cluster-infrastructure="classic openais (with plugin)" \
>     >>>>>
>     >>>>> expected-quorum-votes=2 \
>     >>>>>
>     >>>>> stonith-enabled=false
>     >>>>>
>     >>>>> property redis_replication: \
>     >>>>>
>     >>>>> redis_REPL_INFO=host.com <http://host.com>
> 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?