[ClusterLabs] Pacemaker startup retries

Ken Gaillot kgaillot at redhat.com
Thu Aug 30 11:44:10 EDT 2018


On Thu, 2018-08-30 at 17:24 +0200, Cesar Hernandez wrote:
> Hi
> 
> I have a two-node corosync+pacemaker which, starting only one node,
> it fences the other node. It's ok as the default behaviour as the
> default "startup-fencing" is set to true.
> But, the other node is rebooted 3 times, and then, the remaining node
> starts resources and doesn't fence the node anymore.
> 
> How can I change these 3 times, to, for example, 1 reboot , or more,
> 5? I use a custom fencing script so I'm sure these retries are not
> done by the script but pacemaker, and I also see the reboot
> operations on the logs:
> 
> Aug 30 17:22:08 [12978] xxxx1       crmd:   notice: te_fence_node:	
> Executing reboot fencing operation (81) on xxxx2 (timeout=180000)
> Aug 30 17:22:31 [12978] xxxx1       crmd:   notice: te_fence_node:	
> Executing reboot fencing operation (87) on xxxx2 (timeout=180000)
> Aug 30 17:22:48 [12978] xxxx1       crmd:   notice: te_fence_node:	
> Executing reboot fencing operation (89) on xxxx2 (timeout=180000)

Do you mean you have a custom fencing agent configured? If so, check
the return value of each attempt. Pacemaker should request fencing only
once as long as it succeeds (returns 0), but if the agent fails
(returns nonzero or times out), it will retry, even if the reboot
worked in reality.

If instead you mean you have a script that can request fencing (e.g.
via stonith_admin), then check the logs before each attempt to see if
the request was initiated by the cluster (which should show a policy
engine transition for it) or your script.

FYI, corosync 2 has a "two_node" setting that includes "wait_for_all"
-- with that, you don't need to ignore quorum in pacemaker, and the
cluster won't start until both nodes have seen each other at least
once.

> Software versions:
> 
> corosync-1.4.8
> crmsh-2.1.5
> libqb-0.17.2
> Pacemaker-1.1.14
> resource-agents-3.9.6
> Reusable-Cluster-Components-glue--glue-1.0.12
> 
> Some parameters:
> 
> property cib-bootstrap-options: \
> 	have-watchdog=false \
> 	dc-version=1.1.14-70404b0e5e \
> 	cluster-infrastructure="classic openais (with plugin)" \
> 	expected-quorum-votes=2 \
> 	stonith-enabled=true \
> 	no-quorum-policy=ignore \
> 	default-resource-stickiness=200 \
> 	stonith-timeout=180s \
> 	last-lrm-refresh=1534489943
> 
> 
> Thanks
> 
> César Hernández Bañó
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list