[ClusterLabs] stonith disabled, but pacemaker tries to reboot
kgaillot at redhat.com
Thu Jul 20 12:02:34 EDT 2017
On 07/20/2017 03:46 AM, Daniel.L wrote:
> Hi Pacemaker Users,
> We have a 2 node pacemaker cluster (v1.1.14).
> Stonith at this moment is disabled:
> $ pcs property --all | grep stonith
> stonith-action: reboot
> stonith-enabled: false
> stonith-timeout: 60s
> stonith-watchdog-timeout: (null)
> $ pcs property --all | grep fenc
> startup-fencing: true
> But when there is a network outage - it looks like pacemaker tries to
> restart the other node:
> fence_pcmk: Requesting Pacemaker fence *node1* (reset)
> stonith-ng: notice: Client stonith_admin.cman.xxx.xxxxxxxx
> wants to fence (reboot) '*node1*' with device '(any)'
> stonith-ng: notice: Initiating remote operation reboot for
> *node1*: xxxxxxxxxxxxxxxxxxxxxxxx(0)
> stonith-ng: notice: Couldn't find anyone to fence (reboot)
> *node1* with any device
> stonith-ng: error: Operation reboot of *node1* by <no-one> for
> stonith_admin.cman.xxxx at xxxxxxxxxxx: No such device
> crmd: notice: Peer *node1* was not terminated (reboot) by
> <anyone> for *node2*: No such device (ref=xxxxxxxxxxxxxxxxxxxxxxxx0) by
> client stonith_admin.cman.xxxx
stonith-enabled=false stops *Pacemaker* from requesting fencing, but it
doesn't stop external software from requesting fencing.
One hint in the logs is that the client starts with "stonith_admin"
which is the command-line tool that external apps can use to request
Another hint is "fence_pcmk", which is not a Pacemaker fence agent, but
software that provides an interface to Pacemaker's fencing that CMAN can
understand. So, something asked CMAN to fence the node, and CMAN asked
Pacemaker to do it.
You'll have to figure out what requested it, and see whether there's a
way to disable fence requests in that app. DLM (used by clvmd and some
cluster filesystems) is a prime suspect, and I believe there's no way to
disable fencing inside it.
Of course, disabling fencing is a bad idea anyway :-)
> I'm looking into it for quite a while already, but to be honest - still
> dont understand this behavior...
> I would expect pacemaker not to try to reboot other node if stonith is
> Can anyone help to understand this behavior ? (and hopefully help to
> avoid those reboot attempts )
> Many thanks in advance!
> best regards
More information about the Users