[ClusterLabs] How to cancel a fencing request?

Mon Apr 2 04:54:55 EDT 2018

On Sun, 1 Apr 2018 09:01:15 +0300
Andrei Borzenkov <arvidjaar at gmail.com> wrote:

> 31.03.2018 23:29, Jehan-Guillaume de Rorthais пишет:
> > Hi all,
> > 
> > I experienced a problem in a two node cluster. It has one FA per node and
> > location constraints to avoid the node each of them are supposed to
> > interrupt. 
> 
> If you mean stonith resource - for all I know location it does not
> affect stonith operations and only changes where monitoring action is
> performed.

Sure.

> You can create two stonith resources and declare that each
> can fence only single node, but that is not location constraint, it is
> resource configuration. Showing your configuration would be helpflul to
> avoid guessing.

True, I should have done that. A conf worth thousands of words :)

  crm conf<<EOC

  primitive fence_vm_srv1 stonith:fence_virsh                   \
    params pcmk_host_check="static-list" pcmk_host_list="srv1"  \
           ipaddr="192.168.2.1" login="<user>"                  \
           identity_file="/root/.ssh/id_rsa"                    \
           port="srv1-d8" action="off"                          \
    op monitor interval=10s

  location fence_vm_srv1-avoids-srv1 fence_vm_srv1 -inf: srv1

  primitive fence_vm_srv2 stonith:fence_virsh                   \
    params pcmk_host_check="static-list" pcmk_host_list="srv2"  \
           ipaddr="192.168.2.1" login="<user>"                  \
           identity_file="/root/.ssh/id_rsa"                    \
           port="srv2-d8" action="off"                          \
    op monitor interval=10s

  location fence_vm_srv2-avoids-srv2 fence_vm_srv2 -inf: srv2

  EOC

> > During some tests, a ms resource raised an error during the stop action on
> > both nodes. So both nodes were supposed to be fenced.
> 
> In two-node cluster you can set pcmk_delay_max so that both nodes do not
> attempt fencing simultaneously.

I'm not sure to understand the doc correctly in regard with this property. Does
pcmk_delay_max delay the request itself or the execution of the request?

In other words, is it:

  delay -> fence query -> fencing action

or 

  fence query -> delay -> fence action

?

The first definition would solve this issue, but not the second. As I
understand it, as soon as the fence query has been sent, the node status is
"UNCLEAN (online)".

> > The first node did, but no FA was then able to fence the second one. So the
> > node stayed DC and was reported as "UNCLEAN (online)".
> > 
> > We were able to fix the original ressource problem, but not to avoid the
> > useless second node fencing.
> > 
> > My questions are:
> > 
> > 1. is it possible to cancel the fencing request 
> > 2. is it possible reset the node status to "online" ? 
> 
> Not that I'm aware of.

Argh!

++