<div dir="ltr"><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:107%;font-size:11pt;font-family:Calibri,sans-serif">I’m seeing unexpected behavior when using “unfencing” – I don’t

think I’m understanding it correctly.  I

configured a resource that “requires unfencing” and have a custom fencing agent

which “provides unfencing”.   I perform a simple test where I setup the

cluster and then run “pcs stonith fence node2”, and I see that node2 is successfully

fenced by sending an “off” action to my fencing agent.  But, immediately after this, I see an “on”

action sent to my fencing agent.  My fence

agent doesn’t implement the “reboot” action, so perhaps its trying to reboot by

running an off action followed by a on action. 

Prior to adding “provides unfencing” to the fencing agent, I didn’t see

the on action. It seems unsafe to say “node2 you can’t run” and then immediately “ you

can run”.</p>


<p class="MsoNormal" style="margin:0in 0in 8pt;line-height:107%;font-size:11pt;font-family:Calibri,sans-serif">I don’t think I’m understanding this aspect of

fencing/stonith.  I thought that the

fence agent acted as a proxy to a node, when the node was fenced, it was isolated

from shared storage by some means (power, fabric, etc).  It seems like it shouldn’t become unfenced

until connectivity between the nodes is repaired.  Yet, the node is turn “off” (isolated) and

then “on” (unisolated) immediately.  This (kind-of)

makes sense for a fencing agent that uses power to isolate, since when it’s

turned back on, pacemaker will not started any resources on that node until it

sees the other nodes (due to the wait_for_all setting).  However, for other types of fencing

agents, it doesn’t make sense.  Does the “off”

action not mean isolate from shared storage? And the “on” action not mean

unisolate?  What is the correct way to

understand fencing/stonith?</p>


<p class="MsoNormal" style="margin:0in 0in 8pt;line-height:107%;font-size:11pt;font-family:Calibri,sans-serif">The behavior I wanted to see was, when pacemaker lost

connectivity to a node, it would run the off action for that node.  If this succeeded, it could continue running

resources.  Later, when pacemaker saw the

node again it would run the “on” action on the fence agent (knowing that it was

no longer split-brained).  Node2, would

try to do the same thing, but once it was fenced, it would not longer attempt

to fence node1.  It also wouldn’t attempt

to start any resources.  I thought that adding

“requires unfencing” to the resource would make this happen.  Is there a way to get this behavior?</p>


<p class="MsoNormal" style="margin:0in 0in 8pt;line-height:107%;font-size:11pt;font-family:Calibri,sans-serif"><span style="font-size:11pt">Thanks! </span><br></p><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:107%;font-size:11pt;font-family:Calibri,sans-serif">btw, here's the cluster configuration:</p><p class="MsoNormal" style="margin:0in 0in 8pt;line-height:107%;font-size:11pt;font-family:Calibri,sans-serif"></p><ul><li>pcs cluster auth node1 node2<br></li><li>pcs cluster setup --name ataCluster node1 node2<br></li><li>pcs cluster start –all<br></li><li>pcs property set stonith-enabled=true<br></li><li>pcs resource defaults migration-threshold=1<br></li><li>pcs resource create Jaws ocf:atavium:myResource op stop

on-fail=fence meta requires=unfencing<br></li><li>pcs stonith create myStonith fence_custom op monitor

interval=0 meta provides=unfencing<br></li><li>pcs property set symmetric-cluster=true<br></li></ul><p></p>


</div>