[ClusterLabs] Master/slave failover does not work as expected

Mon Aug 12 09:12:38 EDT 2019

I've been working on porting a product originally based upon Pacemaker 1.0/Heartbeat to Centos7.6, using Pacemaker 2.0 and Corosync.  I've encountered an issue with "failing over" a master resource from one node to another.

Our cluster consists of two nodes, each of which is a host system running Centos7.6, and supporting a single resource, a proprietary application.  That application runs on both nodes concurrently, one as the master, the other as slave.  They communicate state information via an interprocessor link so that if the master instance fails, the slave instance will become master within seconds, and will resume operations from the point of failure of the other application/node.  To test this operation, we do a couple of different tests, one of which is simply to issue a kill -9 <pid> to the master instance of the application.  In my testing, I find that rather than app on the other node becoming master, the app is restarted on the first node, and it becomes master.

Here's some information for reference -
[root at mgraid-16201289RN00023-0 admin]# crm_mon -1
Stack: corosync
Current DC: mgraid-16201289RN00023-0 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Mon Aug 12 05:11:56 2019
Last change: Fri Aug  9 07:45:22 2019 by root via crm_attribute on mgraid-16201289RN00023-0

2 nodes configured
4 resources configured

Online: [ mgraid-16201289RN00023-0 mgraid-16201289RN00023-1 ]

Active resources:

Clone Set: mgraid-stonith-clone [mgraid-stonith]
     Started: [ mgraid-16201289RN00023-0 mgraid-16201289RN00023-1 ]
Master/Slave Set: ms-SS16201289RN00023 [SS16201289RN00023]
    Masters: [ mgraid-16201289RN00023-0 ]
     Slaves: [ mgraid-16201289RN00023-1 ]

I've attached a corosync log file which shows the behavior I described.  Our code uses an OCF compliant resource agent named ss, which invokes a utility named ssadm to communicate with the application as needed.  The ss implementation executes functions named after the $__OCF_ACTION variable when it is invoked, e.g. ss_monitor, ss_demote, etc.

At 07:44:49, the ss agent discovers that the master instance has failed on node mgraid...-0 as a result of a failed ssadm request in response to an ss_monitor() operation.  It issues a crm_master -Q -D command with the intent of demoting the master and promoting the slave, on the other node, to master.  The ss_demote() function finds that the application is no longer running and returns OCF_NOT_RUNNING (7).  In the older product, this was sufficient to promote the other instance to master, but in the current product, that does not happen.  Currently, the failed application is restarted, as expected, and is promoted to master, but this takes 10's of seconds.

As far as I understand it, the intent of the crm_master command in our script is to lower the score of the failed resource so that the instance on the other node will become master.  I've tried modifying our script to use crm_master -Q -v -INFINITY instead, to no avail (which is what the attached log shows.)

In summary then, I need to find an answer to the question of how to accomplish the goal of failing over mastership from one node to another as quickly as possible.  My reading of the Pacemaker documentation makes no mention of crm_master, but the Pacemaker 2.0 Configuration Explained document does seem to imply that this technique of lowering the score should work.  Can anyone offer any guidance?

Regards,
  Michael Powell

[cid:image004.gif at 01D15F05.08FA4670]

    Michael Powell
    Sr. Staff Engineer

    15220 NW Greenbrier Pkwy
        Suite 290
    Beaverton, OR   97006
    T 503-372-7327    M 503-789-3019   H 503-625-5332

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190812/2aceefe0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 1854 bytes
Desc: image001.gif
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190812/2aceefe0/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: corosync-0.log
Type: application/octet-stream
Size: 378482 bytes
Desc: corosync-0.log
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190812/2aceefe0/attachment-0001.obj>