[ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout

Tue Sep 20 10:22:30 EDT 2016

On 09/20/2016 06:42 AM, Digimer wrote:
> On 20/09/16 06:59 AM, Stefan Bauer wrote:
>> Hi,
>>
>> i run a 2 node cluster and want to be save in split-brain scenarios. For
>> this i setup external/ipmi to stonith the other node.
> 
> Please use 'fence_ipmilan'. I believe that the older external/ipmi are
> deprecated (someone correct me if I am wrong on this).

It's just an alternative. The "external/" agents come with the
cluster-glue package, which isn't provided by some distributions (such
as RHEL and its derivatives), so it's "deprecated" on those only.

>> Some possible issues jumped to my mind and i would ike to find the best
>> practice solution:
>>
>> - I have a primitive for each node to stonith. Many documents and guides
>> recommend to never let them run on the host it should fence. I would
>> setup clone resources to avoid dealing with locations that would also
>> influence scoring. Does that make sense?
> 
> Since v1.1.10 of pacemaker, you don't have to worry about this.
> Pacemaker is smart enough to know where to run a fence call from in
> order to terminate a target.

Right, fence devices can run anywhere now, and in fact they don't even
have to be "running" for pacemaker to use them -- as long as they are
configured and not intentionally disabled, pacemaker will use them.

There is still a slight advantage to not running a fence device on a
node it can fence. "Running" a fence device in pacemaker really means
running the recurring monitor for it. Since the node that runs the
monitor has "verified" access to the device, pacemaker will prefer to
use it to execute that device. However, pacemaker will not use a node to
fence itself, except as a last resort if no other node is available. So,
running a fence device on a node it can fence means that the preference
is lost.

That's a very minor detail, not worth worrying about. It's more a matter
of personal preference.

In this particular case, a more relevant concern is that you need
different configurations for the different targets (the IPMI address is
different).

One approach is to define two different fence devices, each with one
IPMI address. In that case, it makes sense to use the location
constraints to ensure the device prefers the node that's not its target.

Another approach (if the fence agent supports it) is to use
pcmk_host_map to provide a different "port" (IPMI address) depending on
which host is being fenced. In this case, you need only one fence device
to be able to fence both hosts. You don't need a clone. (Remember, the
node "running" the device merely refers to its monitor, so the cluster
can still use the fence device, even if that node crashes.)

>> - Monitoring operation on the stonith primitive is dangerous. I read
>> that if monitor operations fail for the stonith device, stonith action
>> is triggered. I think its not clever to give the cluster the option to
>> fence a node just because it has an issue to monitor a fence device.
>> That should not be a reason to shutdown a node. What is your opinion on
>> this? Can i just set the primitive monitor operation to disabled?
> 
> Monitoring is how you will detect that, for example, the IPMI cable
> failed or was unplugged. I do not believe the node will get fenced on
> fence agent monitor failing... At least not by default.

I am not aware of any situation in which a failing fence monitor
triggers a fence. Monitoring is good -- it verifies that the fence
device is still working.

One concern particular to on-board IPMI devices is that they typically
share the same power supply as their host. So if the machine loses
power, the cluster can't contact the IPMI to fence it -- which means it
will be unable to recover any resources from the lost node. (It can't
assume the node lost power -- it's possible just network connectivity
between the two nodes was lost.)

The only way around that is to have a second fence device (such as an
intelligent power switch). If the cluster can't reach the IPMI, it will
try the second device.