[ClusterLabs] Monitor being called repeatedly for Master/Slave resource despite monitor returning failure

Mon Feb 19 06:18:54 EST 2018

 Hi,

I have configured wildfly resource in master slave mode on a 6 VM cluster
with stonith disabled and and no quorum policy set to ignore.

We are observing that on either of master or slave resource failure,
pacemaker keeps on calling stateful_monitor for wildfly repeatedly, despite
us returning appropriate failure return codes on monitor failure for both
master (failure rc=OCF_MASTER_FAILED) and slave (failure
rc=OCF_NOT_RUNNING).

This continues till failure-timeout is reached after which the resource
gets demoted and stopped in case of master monitor failure, and stopped in
case of slave monitor failure.

Could you please help me understand:
Why don't pacemaker demotes or stops resource immediately after first
failure, and keeps calling monitor ?

# pacemakerd --version
Pacemaker 1.1.16
Written by Andrew Beekhof

# corosync -v
Corosync Cluster Engine, version '2.4.2'
Copyright (c) 2006-2009 Red Hat, Inc.

Below is my configuration:

node 1: VM-0
node 2: VM-1
node 3: VM-2
node 4: VM-3
node 5: VM-4
node 6: VM-5
primitive stateful_wildfly ocf:pacemaker:wildfly \
        op start timeout=200s interval=0 \
        op promote timeout=300s interval=0 \
        op monitor interval=90s role=Master timeout=90s \
        op monitor interval=80s role=Slave timeout=100s \
        meta resource-stickiness=100 migration-threshold=3
failure-timeout=240s
ms wildfly_MS stateful_wildfly \
location stateful_wildfly_rule_2 wildfly_MS \
        rule -inf: #uname eq VM-2
location stateful_wildfly_rule_3 wildfly_MS \
        rule -inf: #uname eq VM-3
location stateful_wildfly_rule_4 wildfly_MS \
        rule -inf: #uname eq VM-4
location stateful_wildfly_rule_5 wildfly_MS \
        rule -inf: #uname eq VM-5
property cib-bootstrap-options: \
        stonith-enabled=false \
        no-quorum-policy=ignore \
        cluster-recheck-interval=30s \
        start-failure-is-fatal=false \
        stop-all-resources=false \
        have-watchdog=false \
        dc-version=1.1.16-94ff4df51a \
        cluster-infrastructure=corosync \
        cluster-name=hacluster-0

Could you please help us in understanding this behavior and how to fix this?

Regards,
Pankaj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20180219/8c8cafd7/attachment.html>