[ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout
Stefan Bauer
stefan.bauer at cubewerk.de
Wed Sep 21 08:51:23 CEST 2016
Hi Ken,
let met sum it up:
Pacemaker in recent versions is smart enough to run (trigger, execute) the fence operation on the node, that is not the target.
If i have an external stonith device that can fence multiple nodes, a single primitive is enough in pacemaker.
If with external/ipmi i can only address a single node, i need to have multiple primitives - one for each node.
In this case it's recommended to let the primitive always run on the opposite node - right?
thank you.
Stefan
-----Ursprüngliche Nachricht-----
> Von:Ken Gaillot <kgaillot at redhat.com>
> Gesendet: Die 20 September 2016 16:49
> An: users at clusterlabs.org
> Betreff: Re: [ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout
>
> On 09/20/2016 06:42 AM, Digimer wrote:
> > On 20/09/16 06:59 AM, Stefan Bauer wrote:
> >> Hi,
> >>
> >> i run a 2 node cluster and want to be save in split-brain scenarios. For
> >> this i setup external/ipmi to stonith the other node.
> >
> > Please use 'fence_ipmilan'. I believe that the older external/ipmi are
> > deprecated (someone correct me if I am wrong on this).
>
> It's just an alternative. The "external/" agents come with the
> cluster-glue package, which isn't provided by some distributions (such
> as RHEL and its derivatives), so it's "deprecated" on those only.
>
> >> Some possible issues jumped to my mind and i would ike to find the best
> >> practice solution:
> >>
> >> - I have a primitive for each node to stonith. Many documents and guides
> >> recommend to never let them run on the host it should fence. I would
> >> setup clone resources to avoid dealing with locations that would also
> >> influence scoring. Does that make sense?
> >
> > Since v1.1.10 of pacemaker, you don't have to worry about this.
> > Pacemaker is smart enough to know where to run a fence call from in
> > order to terminate a target.
>
> Right, fence devices can run anywhere now, and in fact they don't even
> have to be "running" for pacemaker to use them -- as long as they are
> configured and not intentionally disabled, pacemaker will use them.
>
> There is still a slight advantage to not running a fence device on a
> node it can fence. "Running" a fence device in pacemaker really means
> running the recurring monitor for it. Since the node that runs the
> monitor has "verified" access to the device, pacemaker will prefer to
> use it to execute that device. However, pacemaker will not use a node to
> fence itself, except as a last resort if no other node is available. So,
> running a fence device on a node it can fence means that the preference
> is lost.
>
> That's a very minor detail, not worth worrying about. It's more a matter
> of personal preference.
>
> In this particular case, a more relevant concern is that you need
> different configurations for the different targets (the IPMI address is
> different).
>
> One approach is to define two different fence devices, each with one
> IPMI address. In that case, it makes sense to use the location
> constraints to ensure the device prefers the node that's not its target.
>
> Another approach (if the fence agent supports it) is to use
> pcmk_host_map to provide a different "port" (IPMI address) depending on
> which host is being fenced. In this case, you need only one fence device
> to be able to fence both hosts. You don't need a clone. (Remember, the
> node "running" the device merely refers to its monitor, so the cluster
> can still use the fence device, even if that node crashes.)
>
> >> - Monitoring operation on the stonith primitive is dangerous. I read
> >> that if monitor operations fail for the stonith device, stonith action
> >> is triggered. I think its not clever to give the cluster the option to
> >> fence a node just because it has an issue to monitor a fence device.
> >> That should not be a reason to shutdown a node. What is your opinion on
> >> this? Can i just set the primitive monitor operation to disabled?
> >
> > Monitoring is how you will detect that, for example, the IPMI cable
> > failed or was unplugged. I do not believe the node will get fenced on
> > fence agent monitor failing... At least not by default.
>
> I am not aware of any situation in which a failing fence monitor
> triggers a fence. Monitoring is good -- it verifies that the fence
> device is still working.
>
> One concern particular to on-board IPMI devices is that they typically
> share the same power supply as their host. So if the machine loses
> power, the cluster can't contact the IPMI to fence it -- which means it
> will be unable to recover any resources from the lost node. (It can't
> assume the node lost power -- it's possible just network connectivity
> between the two nodes was lost.)
>
> The only way around that is to have a second fence device (such as an
> intelligent power switch). If the cluster can't reach the IPMI, it will
> try the second device.
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list