[ClusterLabs] Which node initiates fencing?

Thu Jun 25 10:57:24 EDT 2015

On 06/24/2015 06:39 PM, Jonathan Vargas wrote:
> Thanks Ken.
> 
> It's weird. Because we did tests and that did not happen.
> 
> There is a node (named Z) without stonith/sbd resources assigned at all and
> but it was the node that sent the fencing request to a crashed node (X).
> 
> But this error appeared in its logs: "No route to host".
> 
> It's obvious for us that if SBD isn't running on Z, and there is no network
> access to that crashed node (X), then based on your answer, node Y which
> really had access to X via SBD had to initiate the fencing request. But
> this did not happen.
> 
> In addition to this answer, I wonder if I could tell the cluster to avoid
> sending fencing requests from specific nodes, or at the other side: Tell
> the cluster which nodes are authorized to send fencing requests.
> 
> Any idea?

Yes, that's exactly what you have to do.

By default, a cluster will be "opt-out" -- any resource can run on any
node unless you tell it otherwise. (You can change that to "opt-in", but
for simplicity I'll assume you're using the default.)

The node that "runs" the fencing resource will monitor it, so if only
certain nodes can monitor the device, you need location constraints. How
you configure that depends on what tools you are using (pcs, crm or
low-level), but it's simple, you just say "this resource has this score
on this node". A score of -INFINITY means "never run this resource on
this node".

For fencing resources, the cluster also need to know which hosts the
device can fence. By default the cluster will ask the fence agent by
running its "list" command. If that's not sufficient, you can configure
a static list of hosts that the device can fence. For details see:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_special_treatment_of_stonith_resources

> On Jun 24, 2015 1:56 PM, "Ken Gaillot" <kgaillot at redhat.com> wrote:
> 
>> On 06/24/2015 12:20 PM, Jonathan Vargas wrote:
>>> Hi there,
>>>
>>> We have a 3-node cluster for OCFS2.
>>>
>>> When one of the nodes fail, it should be fenced. I noticed sometimes one
>> of
>>> them is the one who sends the fencing message to the failing node, and
>>> sometimes it's the another.
>>>
>>> How the cluster decides which of the remaining active nodes will be the
>> one
>>> to tell the failed node to fence itself?
>>>
>>> Thanks.
>>
>> Fencing resources are assigned to a node like any other resource, even
>> though they don't really "run" anywhere. Assuming you've configured a
>> recurring monitor operation on the resource, that node will monitor the
>> device to ensure it's available.
>>
>> Because that node has "verified" (monitored) access to the device, the
>> cluster will prefer that node to execute the fencing if possible. So
>> it's just whatever node happened to be assigned the fencing resource.
>>
>> If for any reason the node with verified access can't do it, the cluster
>> will fall back to any other capable node.
>>
>>> *Jonathan Vargas Rodríguez*
>>> Founder and Solution Engineer
>>> Alkaid <https://alkaid.cr/> | Open Source Software
>>>
>>> * mail *  jonathan.vargas at alkaid.cr
>>>  telf   +506 4001 6259 Ext. 01
>>>  mobi   +506 4001 6259 Ext. 51
>>>
>>> <http://linkedin.com/in/jonathanvargas/>
>>> <https://plus.google.com/+JonathanVargas/>
>>> <https://www.facebook.com/alkaid.cr>       <https://twitter.com/alkaidcr