[ClusterLabs] Antw: Monitor being called repeatedly for Master/Slave resource despite monitor returning failure

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Feb 19 06:38:17 EST 2018


 

>>> Pankaj <pankaj386 at gmail.com> schrieb am 19.02.2018 um 12:18 in Nachricht
<CAO5O8CyjSjth+PW1wBqiMAcH5Sgn=77x3gmj=01WCH4ebHYHJQ at mail.gmail.com>:
> Hi,
> 
> 
> I have configured wildfly resource in master slave mode on a 6 VM cluster
> with stonith disabled and and no quorum policy set to ignore.
> 
> We are observing that on either of master or slave resource failure,
> pacemaker keeps on calling stateful_monitor for wildfly repeatedly, despite
> us returning appropriate failure return codes on monitor failure for both
> master (failure rc=OCF_MASTER_FAILED) and slave (failure
> rc=OCF_NOT_RUNNING).
> 
> This continues till failure-timeout is reached after which the resource
> gets demoted and stopped in case of master monitor failure, and stopped in
> case of slave monitor failure.
> 
> Could you please help me understand:
> Why don't pacemaker demotes or stops resource immediately after first
> failure, and keeps calling monitor ?
> 
> # pacemakerd --version
> Pacemaker 1.1.16
> Written by Andrew Beekhof
> 
> # corosync -v
> Corosync Cluster Engine, version '2.4.2'
> Copyright (c) 2006-2009 Red Hat, Inc.
> 
> Below is my configuration:
> 
> node 1: VM-0
> node 2: VM-1
> node 3: VM-2
> node 4: VM-3
> node 5: VM-4
> node 6: VM-5
> primitive stateful_wildfly ocf:pacemaker:wildfly \
>         op start timeout=200s interval=0 \
>         op promote timeout=300s interval=0 \
>         op monitor interval=90s role=Master timeout=90s \
>         op monitor interval=80s role=Slave timeout=100s \
>         meta resource-stickiness=100 migration-threshold=3
> failure-timeout=240s
> ms wildfly_MS stateful_wildfly \
> location stateful_wildfly_rule_2 wildfly_MS \
>         rule -inf: #uname eq VM-2
> location stateful_wildfly_rule_3 wildfly_MS \
>         rule -inf: #uname eq VM-3
> location stateful_wildfly_rule_4 wildfly_MS \
>         rule -inf: #uname eq VM-4
> location stateful_wildfly_rule_5 wildfly_MS \
>         rule -inf: #uname eq VM-5
> property cib-bootstrap-options: \
>         stonith-enabled=false \
>         no-quorum-policy=ignore \
>         cluster-recheck-interval=30s \
>         start-failure-is-fatal=false \
>         stop-all-resources=false \
>         have-watchdog=false \
>         dc-version=1.1.16-94ff4df51a \
>         cluster-infrastructure=corosync \
>         cluster-name=hacluster-0
> 
> Could you please help us in understanding this behavior and how to fix this?

What does the cluster log?

> 
> Regards,
> Pankaj




More information about the Users mailing list