[ClusterLabs] Antw: Monitor being called repeatedly for Master/Slave resource despite monitor returning failure
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon Feb 19 06:38:17 EST 2018
>>> Pankaj <pankaj386 at gmail.com> schrieb am 19.02.2018 um 12:18 in Nachricht
<CAO5O8CyjSjth+PW1wBqiMAcH5Sgn=77x3gmj=01WCH4ebHYHJQ at mail.gmail.com>:
> Hi,
>
>
> I have configured wildfly resource in master slave mode on a 6 VM cluster
> with stonith disabled and and no quorum policy set to ignore.
>
> We are observing that on either of master or slave resource failure,
> pacemaker keeps on calling stateful_monitor for wildfly repeatedly, despite
> us returning appropriate failure return codes on monitor failure for both
> master (failure rc=OCF_MASTER_FAILED) and slave (failure
> rc=OCF_NOT_RUNNING).
>
> This continues till failure-timeout is reached after which the resource
> gets demoted and stopped in case of master monitor failure, and stopped in
> case of slave monitor failure.
>
> Could you please help me understand:
> Why don't pacemaker demotes or stops resource immediately after first
> failure, and keeps calling monitor ?
>
> # pacemakerd --version
> Pacemaker 1.1.16
> Written by Andrew Beekhof
>
> # corosync -v
> Corosync Cluster Engine, version '2.4.2'
> Copyright (c) 2006-2009 Red Hat, Inc.
>
> Below is my configuration:
>
> node 1: VM-0
> node 2: VM-1
> node 3: VM-2
> node 4: VM-3
> node 5: VM-4
> node 6: VM-5
> primitive stateful_wildfly ocf:pacemaker:wildfly \
> op start timeout=200s interval=0 \
> op promote timeout=300s interval=0 \
> op monitor interval=90s role=Master timeout=90s \
> op monitor interval=80s role=Slave timeout=100s \
> meta resource-stickiness=100 migration-threshold=3
> failure-timeout=240s
> ms wildfly_MS stateful_wildfly \
> location stateful_wildfly_rule_2 wildfly_MS \
> rule -inf: #uname eq VM-2
> location stateful_wildfly_rule_3 wildfly_MS \
> rule -inf: #uname eq VM-3
> location stateful_wildfly_rule_4 wildfly_MS \
> rule -inf: #uname eq VM-4
> location stateful_wildfly_rule_5 wildfly_MS \
> rule -inf: #uname eq VM-5
> property cib-bootstrap-options: \
> stonith-enabled=false \
> no-quorum-policy=ignore \
> cluster-recheck-interval=30s \
> start-failure-is-fatal=false \
> stop-all-resources=false \
> have-watchdog=false \
> dc-version=1.1.16-94ff4df51a \
> cluster-infrastructure=corosync \
> cluster-name=hacluster-0
>
> Could you please help us in understanding this behavior and how to fix this?
What does the cluster log?
>
> Regards,
> Pankaj
More information about the Users
mailing list