[Pacemaker] 2-node cluster doesn't move resources away from a failed node

Thu Jul 5 10:15:13 EDT 2012

Oops, I omitted the cluster config :

root at Vindemiatrix:/home/david# crm configure show | cat
node Malastare \
    attributes standby="off"
node Vindemiatrix \
    attributes standby="off"
primitive OVHvIP ocf:pacemaker:OVHvIP
primitive ProFTPd ocf:heartbeat:proftpd \
    params conffile="/etc/proftpd/proftpd.conf" \
    op monitor interval="60s"
primitive VirtualIP ocf:heartbeat:IPaddr2 \
    params ip="178.33.109.180" nic="eth0" cidr_netmask="32"
primitive drbd_backupvi ocf:linbit:drbd \
    params drbd_resource="backupvi" \
    op monitor interval="15s"
primitive drbd_pgsql ocf:linbit:drbd \
    params drbd_resource="postgresql" \
    op monitor interval="15s"
primitive drbd_svn ocf:linbit:drbd \
    params drbd_resource="svn" \
    op monitor interval="15s"
primitive drbd_www ocf:linbit:drbd \
    params drbd_resource="www" \
    op monitor interval="15s"
primitive fs_backupvi ocf:heartbeat:Filesystem \
    params device="/dev/drbd/by-res/backupvi" directory="/var/backupvi"
fstype="ext3"
primitive fs_pgsql ocf:heartbeat:Filesystem \
    params device="/dev/drbd/by-res/postgresql"
directory="/var/lib/postgresql" fstype="ext3" \
    meta target-role="Started"
primitive fs_svn ocf:heartbeat:Filesystem \
    params device="/dev/drbd/by-res/svn" directory="/var/lib/svn"
fstype="ext3" \
    meta target-role="Started"
primitive fs_www ocf:heartbeat:Filesystem \
    params device="/dev/drbd/by-res/www" directory="/var/www" fstype="ext3"
primitive soapi-fencing-malastare stonith:external/ovh \
    params reversedns="ns208812.ovh.net"
primitive soapi-fencing-vindemiatrix stonith:external/ovh \
    params reversedns="ns235795.ovh.net"
ms ms_drbd_backupvi drbd_backupvi \
    meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
ms ms_drbd_pgsql drbd_pgsql \
    meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Master"
ms ms_drbd_svn drbd_svn \
    meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Master"
ms ms_drbd_www drbd_www \
    meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
location stonith-malastare soapi-fencing-malastare -inf: Malastare
location stonith-vindemiatrix soapi-fencing-vindemiatrix -inf: Vindemiatrix
colocation FS_on_same_host inf: ms_drbd_backupvi:Master
ms_drbd_svn:Master ms_drbd_www:Master ms_drbd_pgsql:Master
colocation IPAddr2_with_OVHvIP inf: OVHvIP VirtualIP
colocation IP_and_www inf: OVHvIP ms_drbd_www
colocation ProFTPd_www inf: ProFTPd fs_www
colocation backupvi inf: fs_backupvi ms_drbd_backupvi:Master
colocation pgsql_coloc inf: fs_pgsql ms_drbd_pgsql:Master
colocation svn_coloc inf: fs_svn ms_drbd_svn:Master
colocation www_coloc inf: fs_www ms_drbd_www:Master
order IPAddr2_OVHvIP inf: OVHvIP:start VirtualIP:start
order backupvi_order inf: ms_drbd_backupvi:promote fs_backupvi:start
order pgsql_order inf: ms_drbd_pgsql:promote fs_pgsql:start
order svn_order inf: ms_drbd_svn:promote fs_svn:start
order www_order inf: ms_drbd_www:promote fs_www:start
property $id="cib-bootstrap-options" \
    dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
    cluster-infrastructure="openais" \
    expected-quorum-votes="2" \
    default-resource-stickiness="50" \
    no-quorum-policy="ignore"

Thank you in advance for your help.

Kind regards.

Le 05/07/2012 16:12, David Guyot a écrit :
> Hello, everybody.
>
> As the title suggests, I'm configuring a 2-node cluster but I've got a
> strange issue here : when I put a node in standby mode, using "crm node
> standby", its resources are correctly moved to the second node, and stay
> there even if the first is back on-line, which I assume is the preferred
> behavior (preferred by the designers of such systems) to avoid having
> resources on a potentially unstable node. Nevertheless, when I simulate
> failure of the node which run resources by "/etc/init.d/corosync stop",
> the other node correctly fence the failed node by electrically resetting
> it, but it doesn't mean that it will mount resources on himself; rather,
> it waits the failed node to be back on-line, and then re-negotiates
> resource placement, which inevitably leads to the failed node restarting
> the resources, which I suppose is a consequence of the resource
> stickiness still recorded by the intact node : because this node still
> assume that resources are running on the failed node, it assumes that
> resources prefer to stay on the first node, even if it has failed.
>
> When the first node, Vindemiatrix, has shuts down Corosync, the second,
> Malastare, reports this :
>
> root at Malastare:/home/david# crm_mon --one-shot -VrA
> ============
> Last updated: Thu Jul  5 15:27:01 2012
> Last change: Thu Jul  5 15:26:37 2012 via cibadmin on Malastare
> Stack: openais
> Current DC: Malastare - partition WITHOUT quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 17 Resources configured.
> ============
>
> Node Vindemiatrix: UNCLEAN (offline)
> Online: [ Malastare ]
>
> Full list of resources:
>
>  soapi-fencing-malastare    (stonith:external/ovh):    Started Vindemiatrix
>  soapi-fencing-vindemiatrix    (stonith:external/ovh):    Started Malastare
>  Master/Slave Set: ms_drbd_svn [drbd_svn]
>      Masters: [ Vindemiatrix ]
>      Slaves: [ Malastare ]
>  Master/Slave Set: ms_drbd_pgsql [drbd_pgsql]
>      Masters: [ Vindemiatrix ]
>      Slaves: [ Malastare ]
>  Master/Slave Set: ms_drbd_backupvi [drbd_backupvi]
>      Masters: [ Vindemiatrix ]
>      Slaves: [ Malastare ]
>  Master/Slave Set: ms_drbd_www [drbd_www]
>      Masters: [ Vindemiatrix ]
>      Slaves: [ Malastare ]
>  fs_www    (ocf::heartbeat:Filesystem):    Started Vindemiatrix
>  fs_pgsql    (ocf::heartbeat:Filesystem):    Started Vindemiatrix
>  fs_svn    (ocf::heartbeat:Filesystem):    Started Vindemiatrix
>  fs_backupvi    (ocf::heartbeat:Filesystem):    Started Vindemiatrix
>  VirtualIP    (ocf::heartbeat:IPaddr2):    Started Vindemiatrix
>  OVHvIP    (ocf::pacemaker:OVHvIP):    Started Vindemiatrix
>  ProFTPd    (ocf::heartbeat:proftpd):    Started Vindemiatrix
>
> Node Attributes:
> * Node Malastare:
>     + master-drbd_backupvi:0              : 10000    
>     + master-drbd_pgsql:0                 : 10000    
>     + master-drbd_svn:0                   : 10000    
>     + master-drbd_www:0                   : 10000    
>
> As you can see, the node failure is detected. This state leads to
> attached log file.
>
> Note that both ocf::pacemaker:OVHvIP and stonith:external/ovh are custom
> resources which uses my server provider's SOAP API to provide intended
> services. The STONITH agent does nothing but returning exit status 0
> when start, stop, on or off actions are required, but returns the 2
> nodes names when hostlist or gethosts actions are required and, when
> reset action is required, effectively resets faulting node using the
> provider API. As this API doesn't provide reliable mean to know the
> exact moment of resetting, the STONITH agent pings the faulting node
> every 5 seconds until ping fails, then forks a process which pings the
> faulting node every 5 seconds until it answers, then, due to external
> VPN being not yet installed by the provider, I'm forced to emulate it
> with OpenVPN (which seems to be unable to re-establish a connection lost
> minutes ago, leading to a dual brain situation), the STONITH agent
> restarts OpenVPN to re-establish the connection, then restarts Corosync
> and Pacemaker.
>
> Aside from the VPN issue, of which I'm fully aware of performance and
> stability issues, I thought that Pacemaker would, as soon as the STONITH
> agent returns exit status 0, start the resources on the remaining node,
> but it doesn't. Instead, it seems that the STONITH reset action waits
> too long to report a successful reset, delay which reaches some internal
> timeout, which in turn leads Pacemaker to assume that STONITH agent
> failed, therefore, while eternally trying to reset the node (which only
> leads to the API issuing an error because the last reset request was
> less than 5 minutes ago, something forbidden) stopping actions without
> restarting resources on the remaining node. I tried to search the
> Internet to this parameter, but the only related thing I found is this
> page
> http://lists.linux-ha.org/pipermail/linux-ha/2010-March/039761.html, a
> Linux-HA mailing list archive, which mentions a stonith-timeout
> property, but I've parsed Pacemaker documentation without finding any
> occurrence, and I got an error when I tried to get its value :
>
> root at Vindemiatrix:/home/david# crm_attribute --name stonith-timeout --query
> scope=crm_config  name=stonith-timeout value=(null)
> Error performing operation: The object/attribute does not exist
>
> So what did I miss? Do I must use this property which is not documented
> nor present in the documentation? Or rewrite my STONITH agent to return
> exit status 0 as soon as the API correctly considered the reset request
> (contrary to what Linux-HA http://linux-ha.org/wiki/STONITH precise to
> be necessary)? Or is there something else I missed?
>
> Thank you now for having read this whole mail, and in advance for your help.
>
> Kind regards.