[ClusterLabs] Resource not starting correctly

Mon Apr 15 16:15:12 EDT 2019

I have a simple two-node cluster, node one and node two, with a single
resource, ClusterMyApp. The nodes are CentOS 7 VMs. The resource is created
executing the following line in node one:

   # pcs resource create ClusterMyApp ocf:myapp:myapp-script op monitor
interval=30s

This invokes myapp-script, which I installed under
/usr/lib/ocf/resource.d/myapp/myapp-script, both in one and two - i.e. it
is exactly the same script in both nodes.

On executing the command above in node one, I get the following log entries
in node one itself:

Apr 15 13:40:12 one crmd[13670]:  notice: Result of probe operation for
ClusterMyApp on one: 7 (not running)
Apr 15 13:40:12 one crmd[13670]:  notice: Result of start operation for
ClusterMyApp on one: 0 (ok)

This is in line with what I expect from myapp-script when invoked with the
'start' option (which is what the command above is doing.) myapp-script
first checks out whether my app is running, and if it is not then launches
it. The rest of the log entries are to do with my app, indicating that it
started without any problems.

In node two, when the command above is executed in one, the following log
entries are generated:

Apr 15 13:40:12 two crmd[4293]:  notice: State transition S_IDLE ->
S_POLICY_ENGINE
Apr 15 13:40:12 two pengine[4292]:  notice:  * Start      ClusterMyApp
 (one )
Apr 15 13:40:12 two pengine[4292]:  notice: Calculated transition 16,
saving inputs in /var/lib/pacemaker/pengine/pe-input-66.bz2
Apr 15 13:40:12 two crmd[4293]:  notice: Initiating monitor operation
ClusterMyApp_monitor_0 locally on two
Apr 15 13:40:12 two crmd[4293]:  notice: Initiating monitor operation
ClusterMyApp_monitor_0 on one
Apr 15 13:40:12 two crmd[4293]:  notice: Result of probe operation for
ClusterMyApp on two: 7 (not running)
Apr 15 13:40:12 two crmd[4293]:  notice: Initiating start operation
ClusterMyApp_start_0 on one
Apr 15 13:40:12 two crmd[4293]:  notice: Initiating monitor operation
ClusterMyApp_monitor_30000 on one
Apr 15 13:40:12 two crmd[4293]: warning: Action 4
(ClusterMyApp_monitor_30000) on one failed (target: 0 vs. rc: 7): Error
Apr 15 13:40:12 two crmd[4293]:  notice: Transition aborted by operation
ClusterMyApp_monitor_30000 'create' on one: Event failed

After doing all of the above, pcs status returns the following, when
invoked in either node:

Cluster name: MyCluster
Stack: corosync
Current DC: two (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with
quorum
Last updated: Mon Apr 15 13:45:14 2019
Last change: Mon Apr 15 13:40:11 2019 by root via cibadmin on one

2 nodes configured
1 resource configured

Online: [ one two ]

Full list of resources:

 ClusterMyApp (ocf::myapp:myapp-script): Started one

Failed Actions:
* ClusterMyApp_monitor_30000 on one 'not running' (7): call=37,
status=complete, exitreason='',
    last-rc-change='Mon Apr 15 13:40:12 2019', queued=0ms, exec=105ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

The start function in this script is as follows:

myapp_start() {
  myapp_conf_check
  local diagnostic=$?

  if [ $diagnostic -ne $OCF_SUCCESS ]; then
    return $diagnostic
  fi

  myapp_monitor

  local state=$?

  case $state in
    $OCF_SUCCESS)
        return $OCF_SUCCESS
      ;;

    $OCF_NOT_RUNNING)
      myapp_launch > /dev/null 2>&1
      if [ $?  -eq 0 ]; then
        return $OCF_SUCCESS
      fi

      return $OCF_ERR_GENERIC
      ;;

    *)
      return $state
      ;;
  esac
}

I know for a fact that, in one, myapp_launch gets invoked, and that its
exit value is 0. The function therefore returns OCF_SUCCESS, as it should.
However, if I understand things correctly, the log entries in two seem to
claim that the exit value of the script in one is OCF_NOT_RUNNING.

What's going on here? It's obviously something to do with myapp-script -
but, what?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20190415/fd1ee106/attachment.html>