[ClusterLabs] Monitor being called repeatedly for Master/Slave resource despite monitor failure

Samarth Jain samarthj2992 at gmail.com
Fri Feb 16 09:33:49 EST 2018


I have configured wildfly resource in master slave mode on a 6 VM cluster
with stonith disabled and and no quorum policy set to ignore.

We are observing that on either of master or slave resource failure,
pacemaker keeps on calling stateful_monitor for wildfly repeatedly, despite
us returning appropriate failure return codes on monitor failure for both
master (rc=OCF_MASTER_FAILED) and slave (rc=OCF_NOT_RUNNING).

This continues till failure-timeout is reached after which the resource
gets demoted and stopped in case of master monitor failure and stopped in
case of slave monitor failure.

# pacemakerd --version
Pacemaker 1.1.16
Written by Andrew Beekhof

# corosync -v
Corosync Cluster Engine, version '2.4.2'
Copyright (c) 2006-2009 Red Hat, Inc.

Below is my configuration:

node 1: VM-0
node 2: VM-1
node 3: VM-2
node 4: VM-3
node 5: VM-4
node 6: VM-5
primitive stateful_wildfly ocf:pacemaker:wildfly \
        op start timeout=200s interval=0 \
        op promote timeout=300s interval=0 \
        op monitor interval=90s role=Master timeout=90s \
        op monitor interval=80s role=Slave timeout=100s \
        meta resource-stickiness=100 migration-threshold=3
ms wildfly_MS stateful_wildfly \
location stateful_wildfly_rule_2 wildfly_MS \
        rule -inf: #uname eq VM-2
location stateful_wildfly_rule_3 wildfly_MS \
        rule -inf: #uname eq VM-3
location stateful_wildfly_rule_4 wildfly_MS \
        rule -inf: #uname eq VM-4
location stateful_wildfly_rule_5 wildfly_MS \
        rule -inf: #uname eq VM-5
property cib-bootstrap-options: \
        stonith-enabled=false \
        no-quorum-policy=ignore \
        cluster-recheck-interval=30s \
        start-failure-is-fatal=false \
        stop-all-resources=false \
        have-watchdog=false \
        dc-version=1.1.16-94ff4df51a \
        cluster-infrastructure=corosync \

Could you please help us in understanding this behavior and how to fix this?

Samarth J
