[ClusterLabs] How can I prevent multiple start of IPaddr 2 in an environment using fence_mpath?
飯田 雄介
iidayuus at intellilink.co.jp
Wed Apr 18 01:17:32 EDT 2018
Hi, Ken
Thanks for your comment.
Network fencing that's a valid means, I also think.
However, I think that the reliance on equipment is strong.
Since we do not have an SNMP-capable network switch in our environment, we can not immediately try it.
Thanks, Yusuke
> -----Original Message-----
> From: Users [mailto:users-bounces at clusterlabs.org] On Behalf Of Ken Gaillot
> Sent: Friday, April 06, 2018 11:12 PM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> Subject: Re: [ClusterLabs] How can I prevent multiple start of IPaddr 2 in an
> environment using fence_mpath?
>
> On Fri, 2018-04-06 at 04:30 +0000, 飯田 雄介 wrote:
> > Hi, all
> > I am testing the environment using fence_mpath with the following
> > settings.
> >
> > =======
> > Stack: corosync
> > Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition with
> > quorum
> > Last updated: Fri Apr 6 13:16:20 2018
> > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e
> >
> > 2 nodes configured
> > 13 resources configured
> >
> > Online: [ x3650e x3650f ]
> >
> > Full list of resources:
> >
> > fenceMpath-x3650e (stonith:fence_mpath): Started x3650e
> > fenceMpath-x3650f (stonith:fence_mpath): Started x3650f
> > Resource Group: grpPostgreSQLDB
> > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Start
> > ed x3650e
> > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Start
> > ed x3650e
> > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Start
> > ed x3650e
> > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started x3650e
> > Resource Group: grpPostgreSQLIP
> > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Start
> > ed x3650e
> > Clone Set: clnDiskd1 [prmDiskd1]
> > Started: [ x3650e x3650f ]
> > Clone Set: clnDiskd2 [prmDiskd2]
> > Started: [ x3650e x3650f ]
> > Clone Set: clnPing [prmPing]
> > Started: [ x3650e x3650f ]
> > =======
> >
> > When split-brain occurs in this environment, x3650f executes fence and
> > the resource is started with x3650f.
> >
> > === view of x3650e ====
> > Stack: corosync
> > Current DC: x3650e (version 1.1.17-1.el7-b36b869) - partition
> > WITHOUT quorum
> > Last updated: Fri Apr 6 13:16:36 2018
> > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e
> >
> > 2 nodes configured
> > 13 resources configured
> >
> > Node x3650f: UNCLEAN (offline)
> > Online: [ x3650e ]
> >
> > Full list of resources:
> >
> > fenceMpath-x3650e (stonith:fence_mpath): Started x3650e
> > fenceMpath-x3650f (stonith:fence_mpath): Started[ x3650e x3650f
> > ]
> > Resource Group: grpPostgreSQLDB
> > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Start
> > ed x3650e
> > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Start
> > ed x3650e
> > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Start
> > ed x3650e
> > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started x3650e
> > Resource Group: grpPostgreSQLIP
> > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Start
> > ed x3650e
> > Clone Set: clnDiskd1 [prmDiskd1]
> > prmDiskd1 (ocf::pacemaker:diskd): Started x3650f
> > (UNCLEAN)
> > Started: [ x3650e ]
> > Clone Set: clnDiskd2 [prmDiskd2]
> > prmDiskd2 (ocf::pacemaker:diskd): Started x3650f
> > (UNCLEAN)
> > Started: [ x3650e ]
> > Clone Set: clnPing [prmPing]
> > prmPing (ocf::pacemaker:ping): Started x3650f (UNCLEAN)
> > Started: [ x3650e ]
> >
> > === view of x3650f ====
> > Stack: corosync
> > Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition
> > WITHOUT quorum
> > Last updated: Fri Apr 6 13:16:36 2018
> > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e
> >
> > 2 nodes configured
> > 13 resources configured
> >
> > Online: [ x3650f ]
> > OFFLINE: [ x3650e ]
> >
> > Full list of resources:
> >
> > fenceMpath-x3650e (stonith:fence_mpath): Started x3650f
> > fenceMpath-x3650f (stonith:fence_mpath): Started x3650f
> > Resource Group: grpPostgreSQLDB
> > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Start
> > ed x3650f
> > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Start
> > ed x3650f
> > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Start
> > ed x3650f
> > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started x3650f
> > Resource Group: grpPostgreSQLIP
> > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Start
> > ed x3650f
> > Clone Set: clnDiskd1 [prmDiskd1]
> > Started: [ x3650f ]
> > Stopped: [ x3650e ]
> > Clone Set: clnDiskd2 [prmDiskd2]
> > Started: [ x3650f ]
> > Stopped: [ x3650e ]
> > Clone Set: clnPing [prmPing]
> > Started: [ x3650f ]
> > Stopped: [ x3650e ]
> > =======
> >
> > However, IPaddr2 of x3650e will not stop until pgsql monitor error
> > occurs.
> > At this time, IPaddr2 is temporarily started on two nodes.
> >
> > === view of after pgsql monitor error ===
> > Stack: corosync
> > Current DC: x3650e (version 1.1.17-1.el7-b36b869) - partition
> > WITHOUT quorum
> > Last updated: Fri Apr 6 13:16:56 2018
> > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e
> >
> > 2 nodes configured
> > 13 resources configured
> >
> > Node x3650f: UNCLEAN (offline)
> > Online: [ x3650e ]
> >
> > Full list of resources:
> >
> > fenceMpath-x3650e (stonith:fence_mpath): Started x3650e
> > fenceMpath-x3650f (stonith:fence_mpath): Started[ x3650e x3650f
> > ]
> > Resource Group: grpPostgreSQLDB
> > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Start
> > ed x3650e
> > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Start
> > ed x3650e
> > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Start
> > ed x3650e
> > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Stopped
> > Resource Group: grpPostgreSQLIP
> > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Stopp
> > ed
> > Clone Set: clnDiskd1 [prmDiskd1]
> > prmDiskd1 (ocf::pacemaker:diskd): Started x3650f
> > (UNCLEAN)
> > Started: [ x3650e ]
> > Clone Set: clnDiskd2 [prmDiskd2]
> > prmDiskd2 (ocf::pacemaker:diskd): Started x3650f
> > (UNCLEAN)
> > Started: [ x3650e ]
> > Clone Set: clnPing [prmPing]
> > prmPing (ocf::pacemaker:ping): Started x3650f (UNCLEAN)
> > Started: [ x3650e ]
> >
> > Node Attributes:
> > * Node x3650e:
> > + default_ping_set : 100
> > + diskcheck_status : normal
> > + diskcheck_status_internal : normal
> >
> > Migration Summary:
> > * Node x3650e:
> > prmApPostgreSQLDB: migration-threshold=1 fail-count=1 last-
> > failure='Fri Apr 6 13:16:39 2018'
> >
> > Failed Actions:
> > * prmApPostgreSQLDB_monitor_10000 on x3650e 'not running' (7):
> > call=60, status=complete, exitreason='Configuration file
> > /dbfp/pgdata/data/postgresql.conf doesn't exist',
> > last-rc-change='Fri Apr 6 13:16:39 2018', queued=0ms, exec=0ms
> > ======
> >
> > We regard this behavior as a problem.
> > Is there a way to avoid this behavior?
> >
> > Regards, Yusuke
>
> Hi Yusuke,
>
> One possibility would be to implement network fabric fencing as well, e.g.
> fence_snmp with an SNMP-capable network switch. You can make a fencing topology
> level with both the storage and network devices.
>
> The main drawback is that unfencing isn't automatic. After a fenced node is
> ready to rejoin, you have to clear the block at the switch yourself.
> --
> Ken Gaillot <kgaillot at redhat.com>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list