[ClusterLabs] Continuous master monitor failure of a resource in case some other resource is being promoted

Mon Feb 25 14:36:38 EST 2019

25.02.2019 11:50, Samarth Jain пишет:
> Hi,
> 
> 
> We have a bunch of resources running in master slave configuration with one
> master and one slave instance running at any given time.
> 
> What we observe is, that for any two given resources at a time, if say
> resource Stateful_Test_1 is in middle of doing a promote and it takes
> significant amount of time (close to 150 seconds in our scenario) for it to
> complete promote (like starting a web server) and, during this time, say
> resource Stateful_Test_2's master instance fails, then the failure of
> Stateful_Test_2 master is never honored by pengine and the monitor being
> reoccurring keeps on failing without any action being taken by the DC.
> 
> We see below logs for the failure of Stateful_Test_2 in the DC which was
> VM-3 at that time:
> 
> Feb 25 11:28:13 [6013] VM-3       crmd:   notice: abort_transition_graph:
>     Transition aborted by operation Stateful_Test_2_monitor_17000 'create'
> on VM-1: Old event | magic=0:9;329:8:8:4a2b407e-ad15-43d0-8248-e70f9f22436b
> cib=0.191.5 source=process_graph_event:498 complete=false
> 
> As per our current testing, the Stateful_Test_2 resource has failed 590
> times and it still continues to fail!! without the failure being processed
> by pacemaker. We have to manually intervene to recover it by doing a
> resource restart.
> 

I can reproduce it with pacemaker 2.0.0 + git (openSUSE Tumbleweed) as well.

> Could you please help me understand:
> 1. Why doesn't pacemaker process the failure of Stateful_Test_2 resource
> immediately after first failure?

I vaguely remember something about sequential execution mentioned before
but cannot find details.

> 2. Why does the monitor failure of Stateful_Test_2 continue even after the
> promote of Stateful_Test_1 has been completed? Shouldn't it handle
> Stateful_Test_2's failure and take necessary action on it? It feels as if
> that particular failure 'event' has been 'dropped' and pengine is not even
> aware of the Stateful_Test_2's failure.
> 

Yes. Although crm_mon shows resource as being master on this node, in
reality resource is left in failed state forever and monitor result is
simply ignored.

> It's pretty straightforward to reproduce this issue.
> I have attached the two dummy resource agents which we used to simulate our
> scenario along with the commands used to configure the resource and ban it
> on other VMs in the cluster.
> 
> Note: We have intentionally kept monitor intervals as low, contrary to the
> suggestions since we want our failure detection to be faster as the
> resources are critical to our component
> 
> Once the resources are configured, you need to issue following two commands
> to reproduce the problem:
> 1. crm resource restart Stateful_Test_1
> In another session tab, wherever Stateful_Test_2 is running as master, you
> need to delete the marker file which is monitored as part of
> Stateful_Test_2's master monitor.
> In our case it was in VM-1 so I deleted the marker from there.
> [root at VM-1~]
> # rm -f /root/stateful_Test_2_Marker
> 
> Now if you check the logs in /root, you would see that Stateful_Test_2
> would print failure logs back to back. Showing a sample from our current
> Stateful_Test_2.log file.
> # cat /root/Stateful_Test_2.log
> Mon Feb 25 11:00:29 IST 2019 Inside promote for Stateful_Test_2!
> Mon Feb 25 11:00:34 IST 2019 Promote for Stateful_Test_2 completed!
> Mon Feb 25 11:28:13 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> Mon Feb 25 11:28:30 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> Mon Feb 25 11:28:47 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> Mon Feb 25 11:29:04 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> Mon Feb 25 11:29:21 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> Mon Feb 25 11:29:38 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> Mon Feb 25 11:29:55 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> Mon Feb 25 11:30:12 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> Mon Feb 25 11:30:29 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> Mon Feb 25 11:30:46 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> .
> .
> .
> Mon Feb 25 14:15:08 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> Mon Feb 25 14:15:25 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> Mon Feb 25 14:15:42 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> Mon Feb 25 14:15:59 IST 2019 Master monitor failed for Stateful_Test_2.
> Returning 9
> 
> # pacemakerd --version
> Pacemaker 1.1.18
> Written by Andrew Beekhof
> 
> # corosync -v
> Corosync Cluster Engine, version '2.4.2'
> Copyright (c) 2006-2009 Red Hat, Inc.
> 
> Below is my cluster configuration:
> 
> node 1: VM-0
> node 2: VM-1
> node 3: VM-2
> node 4: VM-3
> node 5: VM-4
> node 6: VM-5
> primitive Stateful_Test_1 ocf:pacemaker:Stateful_Test_1 \
>         op start timeout=200s interval=0 \
>         op promote timeout=300s interval=0 \
>         op monitor interval=15s role=Master timeout=30s \
>         op monitor interval=20s role=Slave timeout=30s \
>         op stop on-fail=restart interval=0 \
>         meta resource-stickiness=100 migration-threshold=1
> failure-timeout=15s
> primitive Stateful_Test_2 ocf:pacemaker:Stateful_Test_2 \
>         op start timeout=200s interval=0 \
>         op promote timeout=300s interval=0 \
>         op monitor interval=17s role=Master timeout=30s \
>         op monitor interval=25s role=Slave timeout=30s \
>         op stop on-fail=restart interval=0 \
>         meta resource-stickiness=100 migration-threshold=1
> failure-timeout=15s
> ms StatefulTest1_MS Stateful_Test_1 \
>         meta resource-stickiness=100 notify=true master-max=1
> interleave=true target-role=Started
> ms StatefulTest2_MS Stateful_Test_2 \
>         meta resource-stickiness=100 notify=true master-max=1
> interleave=true target-role=Started
> location Stateful_Test_1_rule_2 StatefulTest1_MS \
>         rule -inf: #uname eq VM-2
> location Stateful_Test_1_rule_3 StatefulTest1_MS \
>         rule -inf: #uname eq VM-3
> location Stateful_Test_1_rule_4 StatefulTest1_MS \
>         rule -inf: #uname eq VM-4
> location Stateful_Test_1_rule_5 StatefulTest1_MS \
>         rule -inf: #uname eq VM-5
> location Stateful_Test_2_rule_2 StatefulTest2_MS \
>         rule -inf: #uname eq VM-2
> location Stateful_Test_2_rule_3 StatefulTest2_MS \
>         rule -inf: #uname eq VM-3
> location Stateful_Test_2_rule_4 StatefulTest2_MS \
>         rule -inf: #uname eq VM-4
> location Stateful_Test_2_rule_5 StatefulTest2_MS \
>         rule -inf: #uname eq VM-5
> property cib-bootstrap-options: \
>         stonith-enabled=false \
>         no-quorum-policy=ignore \
>         cluster-recheck-interval=30s \
>         start-failure-is-fatal=false \
>         stop-all-resources=false \
>         have-watchdog=false \
>         dc-version=1.1.16-94ff4df51a \
>         cluster-infrastructure=corosync \
>         cluster-name=hacluster-0
> 
> Since the resource failure is never processed, it's a serious problem for
> us, as it requires manual intervention to restart that resource.
> 
> Could you please help us in understanding this behavior and how to fix this?
> 

Your problem is triggered by too low failure-timeout. Failure of master
is cleared before pacemaker picks it for processing (or so I interpret
it). You should set failure-timeout to be longer than your actions may
take. This will give you at least workaround.

Note that in your configuration resource cannot be recovered anyway.
migration-threshold is 1 so pacemaker cannot (try to) restart master on
the same node but you prohibit running it anywhere else.