[ClusterLabs] How can I prevent multiple start of IPaddr 2 in an environment using fence_mpath?

Fri Apr 6 01:03:45 EDT 2018

06.04.2018 07:30, 飯田 雄介 пишет:
> Hi, all
> I am testing the environment using fence_mpath with the following settings.
> 
> =======
>   Stack: corosync
>   Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition with quorum
>   Last updated: Fri Apr  6 13:16:20 2018
>   Last change: Thu Mar  1 18:38:02 2018 by root via cibadmin on x3650e
> 
>   2 nodes configured
>   13 resources configured
> 
>   Online: [ x3650e x3650f ]
> 
>   Full list of resources:
> 
>    fenceMpath-x3650e    (stonith:fence_mpath):  Started x3650e
>    fenceMpath-x3650f    (stonith:fence_mpath):  Started x3650f
>    Resource Group: grpPostgreSQLDB
>        prmFsPostgreSQLDB1       (ocf::heartbeat:Filesystem):    Started x3650e
>        prmFsPostgreSQLDB2       (ocf::heartbeat:Filesystem):    Started x3650e
>        prmFsPostgreSQLDB3       (ocf::heartbeat:Filesystem):    Started x3650e
>        prmApPostgreSQLDB        (ocf::heartbeat:pgsql): Started x3650e
>    Resource Group: grpPostgreSQLIP
>        prmIpPostgreSQLDB        (ocf::heartbeat:IPaddr2):       Started x3650e
>    Clone Set: clnDiskd1 [prmDiskd1]
>        Started: [ x3650e x3650f ]
>    Clone Set: clnDiskd2 [prmDiskd2]
>        Started: [ x3650e x3650f ]
>    Clone Set: clnPing [prmPing]
>        Started: [ x3650e x3650f ]
> =======
> 
> When split-brain occurs in this environment, x3650f executes fence and the resource is started with x3650f.
> 
> === view of x3650e ====
>   Stack: corosync
>   Current DC: x3650e (version 1.1.17-1.el7-b36b869) - partition WITHOUT quorum
>   Last updated: Fri Apr  6 13:16:36 2018
>   Last change: Thu Mar  1 18:38:02 2018 by root via cibadmin on x3650e
> 
>   2 nodes configured
>   13 resources configured
> 
>   Node x3650f: UNCLEAN (offline)
>   Online: [ x3650e ]
> 
>   Full list of resources:
> 
>    fenceMpath-x3650e    (stonith:fence_mpath):  Started x3650e
>    fenceMpath-x3650f    (stonith:fence_mpath):  Started[ x3650e x3650f ]
>    Resource Group: grpPostgreSQLDB
>        prmFsPostgreSQLDB1       (ocf::heartbeat:Filesystem):    Started x3650e
>        prmFsPostgreSQLDB2       (ocf::heartbeat:Filesystem):    Started x3650e
>        prmFsPostgreSQLDB3       (ocf::heartbeat:Filesystem):    Started x3650e
>        prmApPostgreSQLDB        (ocf::heartbeat:pgsql): Started x3650e
>    Resource Group: grpPostgreSQLIP
>        prmIpPostgreSQLDB        (ocf::heartbeat:IPaddr2):       Started x3650e
>    Clone Set: clnDiskd1 [prmDiskd1]
>        prmDiskd1        (ocf::pacemaker:diskd): Started x3650f (UNCLEAN)
>        Started: [ x3650e ]
>    Clone Set: clnDiskd2 [prmDiskd2]
>        prmDiskd2        (ocf::pacemaker:diskd): Started x3650f (UNCLEAN)
>        Started: [ x3650e ]
>    Clone Set: clnPing [prmPing]
>        prmPing  (ocf::pacemaker:ping):  Started x3650f (UNCLEAN)
>        Started: [ x3650e ]
> 
> === view of x3650f ====
>   Stack: corosync
>   Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition WITHOUT quorum
>   Last updated: Fri Apr  6 13:16:36 2018
>   Last change: Thu Mar  1 18:38:02 2018 by root via cibadmin on x3650e
> 
>   2 nodes configured
>   13 resources configured
> 
>   Online: [ x3650f ]
>   OFFLINE: [ x3650e ]
> 
>   Full list of resources:
> 
>    fenceMpath-x3650e    (stonith:fence_mpath):  Started x3650f
>    fenceMpath-x3650f    (stonith:fence_mpath):  Started x3650f
>    Resource Group: grpPostgreSQLDB
>        prmFsPostgreSQLDB1       (ocf::heartbeat:Filesystem):    Started x3650f
>        prmFsPostgreSQLDB2       (ocf::heartbeat:Filesystem):    Started x3650f
>        prmFsPostgreSQLDB3       (ocf::heartbeat:Filesystem):    Started x3650f
>        prmApPostgreSQLDB        (ocf::heartbeat:pgsql): Started x3650f
>    Resource Group: grpPostgreSQLIP
>        prmIpPostgreSQLDB        (ocf::heartbeat:IPaddr2):       Started x3650f
>    Clone Set: clnDiskd1 [prmDiskd1]
>        Started: [ x3650f ]
>        Stopped: [ x3650e ]
>    Clone Set: clnDiskd2 [prmDiskd2]
>        Started: [ x3650f ]
>        Stopped: [ x3650e ]
>    Clone Set: clnPing [prmPing]
>        Started: [ x3650f ]
>        Stopped: [ x3650e ]
> =======
> 
> However, IPaddr2 of x3650e will not stop until pgsql monitor error occurs.
> At this time, IPaddr2 is temporarily started on two nodes.
> 
> === view of after pgsql monitor error ===
>   Stack: corosync
>   Current DC: x3650e (version 1.1.17-1.el7-b36b869) - partition WITHOUT quorum
>   Last updated: Fri Apr  6 13:16:56 2018
>   Last change: Thu Mar  1 18:38:02 2018 by root via cibadmin on x3650e
> 
>   2 nodes configured
>   13 resources configured
> 
>   Node x3650f: UNCLEAN (offline)
>   Online: [ x3650e ]
> 
>   Full list of resources:
> 
>    fenceMpath-x3650e    (stonith:fence_mpath):  Started x3650e
>    fenceMpath-x3650f    (stonith:fence_mpath):  Started[ x3650e x3650f ]
>    Resource Group: grpPostgreSQLDB
>        prmFsPostgreSQLDB1       (ocf::heartbeat:Filesystem):    Started x3650e
>        prmFsPostgreSQLDB2       (ocf::heartbeat:Filesystem):    Started x3650e
>        prmFsPostgreSQLDB3       (ocf::heartbeat:Filesystem):    Started x3650e
>        prmApPostgreSQLDB        (ocf::heartbeat:pgsql): Stopped
>    Resource Group: grpPostgreSQLIP
>        prmIpPostgreSQLDB        (ocf::heartbeat:IPaddr2):       Stopped
>    Clone Set: clnDiskd1 [prmDiskd1]
>        prmDiskd1        (ocf::pacemaker:diskd): Started x3650f (UNCLEAN)
>        Started: [ x3650e ]
>    Clone Set: clnDiskd2 [prmDiskd2]
>        prmDiskd2        (ocf::pacemaker:diskd): Started x3650f (UNCLEAN)
>        Started: [ x3650e ]
>    Clone Set: clnPing [prmPing]
>        prmPing  (ocf::pacemaker:ping):  Started x3650f (UNCLEAN)
>        Started: [ x3650e ]
> 
>   Node Attributes:
>   * Node x3650e:
>       + default_ping_set                        : 100
>       + diskcheck_status                        : normal
>       + diskcheck_status_internal               : normal
> 
>   Migration Summary:
>   * Node x3650e:
>      prmApPostgreSQLDB: migration-threshold=1 fail-count=1 last-failure='Fri Apr  6 13:16:39 2018'
> 
>   Failed Actions:
>   * prmApPostgreSQLDB_monitor_10000 on x3650e 'not running' (7): call=60, status=complete, exitreason='Configuration file /dbfp/pgdata/data/postgresql.conf doesn't exist',
>       last-rc-change='Fri Apr  6 13:16:39 2018', queued=0ms, exec=0ms
> ======
> 
> We regard this behavior as a problem.
> Is there a way to avoid this behavior?
> 

Use node level stonith agent instead of storage resource fencing? :)

Seriously, storage fencing just ensures that other node(s) cannot access
the same resources and so damage data by uncontrolled concurrent access.
Otherwise node with fenced off resource continues to run "normally".

See also https://access.redhat.com/articles/3078811 for some statements
regarding use of storage fencing.

The only workaround for two node cluster I can think of is to
artificially delay stonith agent completion to be longer than monitor
timeout. This way node will not begin failover resources until resources
are (hopefully) stopped on other node. You can probably do it with
power_timeout property.

For three+ nodes setting no-quorum-policy=stop may work, although it
does not solve the problem of intentional node fencing of healthy node
(e.g. due to resource stop failure).