[Pacemaker] 2-node cluster doesn't move resources away from a failed node

Mon Jul 9 03:32:37 EDT 2012

Thank you for your help!

I found the problem; it came from a bug in my STONITH agent, which
caused it to become a zombie. I corrected this bug and the cluster now
fails over as expected.

Kind regards.

Le 08/07/2012 00:12, Andreas Kurz a écrit :
> On 07/05/2012 04:12 PM, David Guyot wrote:
>> Hello, everybody.
>>
>> As the title suggests, I'm configuring a 2-node cluster but I've got a
>> strange issue here : when I put a node in standby mode, using "crm node
>> standby", its resources are correctly moved to the second node, and stay
>> there even if the first is back on-line, which I assume is the preferred
>> behavior (preferred by the designers of such systems) to avoid having
>> resources on a potentially unstable node. Nevertheless, when I simulate
>> failure of the node which run resources by "/etc/init.d/corosync stop",
>> the other node correctly fence the failed node by electrically resetting
>> it, but it doesn't mean that it will mount resources on himself; rather,
>> it waits the failed node to be back on-line, and then re-negotiates
>> resource placement, which inevitably leads to the failed node restarting
>> the resources, which I suppose is a consequence of the resource
>> stickiness still recorded by the intact node : because this node still
>> assume that resources are running on the failed node, it assumes that
>> resources prefer to stay on the first node, even if it has failed.
>>
>> When the first node, Vindemiatrix, has shuts down Corosync, the second,
>> Malastare, reports this :
>>
>> root at Malastare:/home/david# crm_mon --one-shot -VrA
>> ============
>> Last updated: Thu Jul  5 15:27:01 2012
>> Last change: Thu Jul  5 15:26:37 2012 via cibadmin on Malastare
>> Stack: openais
>> Current DC: Malastare - partition WITHOUT quorum
>> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
>> 2 Nodes configured, 2 expected votes
>> 17 Resources configured.
>> ============
>>
>> Node Vindemiatrix: UNCLEAN (offline)
> Pacemaker thinks fencing was not successful and will not recover
> resources until STONITH was successful ... or the node returns an it is
> possible to probe resource states
>
>> Online: [ Malastare ]
>>
>> Full list of resources:
>>
>>  soapi-fencing-malastare    (stonith:external/ovh):    Started Vindemiatrix
>>  soapi-fencing-vindemiatrix    (stonith:external/ovh):    Started Malastare
>>  Master/Slave Set: ms_drbd_svn [drbd_svn]
>>      Masters: [ Vindemiatrix ]
>>      Slaves: [ Malastare ]
>>  Master/Slave Set: ms_drbd_pgsql [drbd_pgsql]
>>      Masters: [ Vindemiatrix ]
>>      Slaves: [ Malastare ]
>>  Master/Slave Set: ms_drbd_backupvi [drbd_backupvi]
>>      Masters: [ Vindemiatrix ]
>>      Slaves: [ Malastare ]
>>  Master/Slave Set: ms_drbd_www [drbd_www]
>>      Masters: [ Vindemiatrix ]
>>      Slaves: [ Malastare ]
>>  fs_www    (ocf::heartbeat:Filesystem):    Started Vindemiatrix
>>  fs_pgsql    (ocf::heartbeat:Filesystem):    Started Vindemiatrix
>>  fs_svn    (ocf::heartbeat:Filesystem):    Started Vindemiatrix
>>  fs_backupvi    (ocf::heartbeat:Filesystem):    Started Vindemiatrix
>>  VirtualIP    (ocf::heartbeat:IPaddr2):    Started Vindemiatrix
>>  OVHvIP    (ocf::pacemaker:OVHvIP):    Started Vindemiatrix
>>  ProFTPd    (ocf::heartbeat:proftpd):    Started Vindemiatrix
>>
>> Node Attributes:
>> * Node Malastare:
>>     + master-drbd_backupvi:0              : 10000    
>>     + master-drbd_pgsql:0                 : 10000    
>>     + master-drbd_svn:0                   : 10000    
>>     + master-drbd_www:0                   : 10000    
>>
>> As you can see, the node failure is detected. This state leads to
>> attached log file.
>>
>> Note that both ocf::pacemaker:OVHvIP and stonith:external/ovh are custom
>> resources which uses my server provider's SOAP API to provide intended
>> services. The STONITH agent does nothing but returning exit status 0
>> when start, stop, on or off actions are required, but returns the 2
>> nodes names when hostlist or gethosts actions are required and, when
>> reset action is required, effectively resets faulting node using the
>> provider API. As this API doesn't provide reliable mean to know the
>> exact moment of resetting, the STONITH agent pings the faulting node
>> every 5 seconds until ping fails, then forks a process which pings the
>> faulting node every 5 seconds until it answers, then, due to external
>> VPN being not yet installed by the provider, I'm forced to emulate it
>> with OpenVPN (which seems to be unable to re-establish a connection lost
>> minutes ago, leading to a dual brain situation), the STONITH agent
>> restarts OpenVPN to re-establish the connection, then restarts Corosync
>> and Pacemaker.
>>
>> Aside from the VPN issue, of which I'm fully aware of performance and
>> stability issues, I thought that Pacemaker would, as soon as the STONITH
>> agent returns exit status 0, start the resources on the remaining node,
>> but it doesn't. Instead, it seems that the STONITH reset action waits
>> too long to report a successful reset, delay which reaches some internal
>> timeout, which in turn leads Pacemaker to assume that STONITH agent
>> failed, therefore, while eternally trying to reset the node (which only
>> leads to the API issuing an error because the last reset request was
>> less than 5 minutes ago, something forbidden) stopping actions without
>> restarting resources on the remaining node. I tried to search the
>> Internet to this parameter, but the only related thing I found is this
>> page
>> http://lists.linux-ha.org/pipermail/linux-ha/2010-March/039761.html, a
>> Linux-HA mailing list archive, which mentions a stonith-timeout
>> property, but I've parsed Pacemaker documentation without finding any
>> occurrence, and I got an error when I tried to get its value :
> man stonithd
>
>> root at Vindemiatrix:/home/david# crm_attribute --name stonith-timeout --query
>> scope=crm_config  name=stonith-timeout value=(null)
>> Error performing operation: The object/attribute does not exist
> stonith-timeout defaults to 60s ... crm configure property
> stonith-timeout=XY .... to increase it cluster-wide... or you can add an
> individual value as resource attribute to your stonith resources.
>
> Regards,
> Andreas
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

Le 08/07/2012 00:12, Andreas Kurz a écrit :
> On 07/05/2012 04:12 PM, David Guyot wrote:
>> Hello, everybody.
>>
>> As the title suggests, I'm configuring a 2-node cluster but I've got a
>> strange issue here : when I put a node in standby mode, using "crm node
>> standby", its resources are correctly moved to the second node, and stay
>> there even if the first is back on-line, which I assume is the preferred
>> behavior (preferred by the designers of such systems) to avoid having
>> resources on a potentially unstable node. Nevertheless, when I simulate
>> failure of the node which run resources by "/etc/init.d/corosync stop",
>> the other node correctly fence the failed node by electrically resetting
>> it, but it doesn't mean that it will mount resources on himself; rather,
>> it waits the failed node to be back on-line, and then re-negotiates
>> resource placement, which inevitably leads to the failed node restarting
>> the resources, which I suppose is a consequence of the resource
>> stickiness still recorded by the intact node : because this node still
>> assume that resources are running on the failed node, it assumes that
>> resources prefer to stay on the first node, even if it has failed.
>>
>> When the first node, Vindemiatrix, has shuts down Corosync, the second,
>> Malastare, reports this :
>>
>> root at Malastare:/home/david# crm_mon --one-shot -VrA
>> ============
>> Last updated: Thu Jul  5 15:27:01 2012
>> Last change: Thu Jul  5 15:26:37 2012 via cibadmin on Malastare
>> Stack: openais
>> Current DC: Malastare - partition WITHOUT quorum
>> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
>> 2 Nodes configured, 2 expected votes
>> 17 Resources configured.
>> ============
>>
>> Node Vindemiatrix: UNCLEAN (offline)
> Pacemaker thinks fencing was not successful and will not recover
> resources until STONITH was successful ... or the node returns an it is
> possible to probe resource states
>
>> Online: [ Malastare ]
>>
>> Full list of resources:
>>
>>  soapi-fencing-malastare    (stonith:external/ovh):    Started Vindemiatrix
>>  soapi-fencing-vindemiatrix    (stonith:external/ovh):    Started Malastare
>>  Master/Slave Set: ms_drbd_svn [drbd_svn]
>>      Masters: [ Vindemiatrix ]
>>      Slaves: [ Malastare ]
>>  Master/Slave Set: ms_drbd_pgsql [drbd_pgsql]
>>      Masters: [ Vindemiatrix ]
>>      Slaves: [ Malastare ]
>>  Master/Slave Set: ms_drbd_backupvi [drbd_backupvi]
>>      Masters: [ Vindemiatrix ]
>>      Slaves: [ Malastare ]
>>  Master/Slave Set: ms_drbd_www [drbd_www]
>>      Masters: [ Vindemiatrix ]
>>      Slaves: [ Malastare ]
>>  fs_www    (ocf::heartbeat:Filesystem):    Started Vindemiatrix
>>  fs_pgsql    (ocf::heartbeat:Filesystem):    Started Vindemiatrix
>>  fs_svn    (ocf::heartbeat:Filesystem):    Started Vindemiatrix
>>  fs_backupvi    (ocf::heartbeat:Filesystem):    Started Vindemiatrix
>>  VirtualIP    (ocf::heartbeat:IPaddr2):    Started Vindemiatrix
>>  OVHvIP    (ocf::pacemaker:OVHvIP):    Started Vindemiatrix
>>  ProFTPd    (ocf::heartbeat:proftpd):    Started Vindemiatrix
>>
>> Node Attributes:
>> * Node Malastare:
>>     + master-drbd_backupvi:0              : 10000    
>>     + master-drbd_pgsql:0                 : 10000    
>>     + master-drbd_svn:0                   : 10000    
>>     + master-drbd_www:0                   : 10000    
>>
>> As you can see, the node failure is detected. This state leads to
>> attached log file.
>>
>> Note that both ocf::pacemaker:OVHvIP and stonith:external/ovh are custom
>> resources which uses my server provider's SOAP API to provide intended
>> services. The STONITH agent does nothing but returning exit status 0
>> when start, stop, on or off actions are required, but returns the 2
>> nodes names when hostlist or gethosts actions are required and, when
>> reset action is required, effectively resets faulting node using the
>> provider API. As this API doesn't provide reliable mean to know the
>> exact moment of resetting, the STONITH agent pings the faulting node
>> every 5 seconds until ping fails, then forks a process which pings the
>> faulting node every 5 seconds until it answers, then, due to external
>> VPN being not yet installed by the provider, I'm forced to emulate it
>> with OpenVPN (which seems to be unable to re-establish a connection lost
>> minutes ago, leading to a dual brain situation), the STONITH agent
>> restarts OpenVPN to re-establish the connection, then restarts Corosync
>> and Pacemaker.
>>
>> Aside from the VPN issue, of which I'm fully aware of performance and
>> stability issues, I thought that Pacemaker would, as soon as the STONITH
>> agent returns exit status 0, start the resources on the remaining node,
>> but it doesn't. Instead, it seems that the STONITH reset action waits
>> too long to report a successful reset, delay which reaches some internal
>> timeout, which in turn leads Pacemaker to assume that STONITH agent
>> failed, therefore, while eternally trying to reset the node (which only
>> leads to the API issuing an error because the last reset request was
>> less than 5 minutes ago, something forbidden) stopping actions without
>> restarting resources on the remaining node. I tried to search the
>> Internet to this parameter, but the only related thing I found is this
>> page
>> http://lists.linux-ha.org/pipermail/linux-ha/2010-March/039761.html, a
>> Linux-HA mailing list archive, which mentions a stonith-timeout
>> property, but I've parsed Pacemaker documentation without finding any
>> occurrence, and I got an error when I tried to get its value :
> man stonithd
>
>> root at Vindemiatrix:/home/david# crm_attribute --name stonith-timeout --query
>> scope=crm_config  name=stonith-timeout value=(null)
>> Error performing operation: The object/attribute does not exist
> stonith-timeout defaults to 60s ... crm configure property
> stonith-timeout=XY .... to increase it cluster-wide... or you can add an
> individual value as resource attribute to your stonith resources.
>
> Regards,
> Andreas
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120709/92baa292/attachment-0003.sig>