[ClusterLabs] stonith in dual HMC environment

Dejan Muhamedagic dejanmm at fastmail.fm
Tue Mar 28 12:47:35 EDT 2017


On Tue, Mar 28, 2017 at 04:20:12PM +0300, Alexander Markov wrote:
> Hello, Dejan,
> 
> >Why? I don't have a test system right now, but for instance this
> >should work:
> >
> >$ stonith -t ibmhmc ipaddr=10.1.2.9 -lS
> >$ stonith -t ibmhmc ipaddr=10.1.2.9 -T reset {nodename}
> 
> Ah, I see. Everything (including stonith methods, fencing and failover)
> works just fine under normal circumstances. Sorry if I wasn't clear about
> that. The problem occurs only when I have one datacenter (i.e. one IBM
> machine and one HMC) lost due to power outage.
> 
> For example:
> test01:~ # stonith -t ibmhmc ipaddr=10.1.2.8 -lS | wc -l
> info: ibmhmc device OK.
> 39
> test01:~ # stonith -t ibmhmc ipaddr=10.1.2.9 -lS | wc -l
> info: ibmhmc device OK.
> 39
> 
> As I had said stonith device can see and manage all the cluster nodes.

That's great :)

> >If so, then your configuration does not appear to be correct. If
> >both are capable of managing all nodes then you should tell
> >pacemaker about it.
> 
> Thanks for the hint. But if stonith device return node list, isn't it
> obvious for cluster that it can manage those nodes?

Did you try that? Just drop the location constraints and see if
it works. The pacemaker should actually keep the list of resources
(stonith) capable of managing the node.

> Could you please be more
> precise about what you refer to? I currently changed configuration to two
> fencing levels (one per HMC) but still don't think I get an idea here.
> 
> >Survived node, running stonith resource for dead node tries to
> >contact ipmi device (which is also dead). How does cluster understand that
> >lost node is really dead and it's not just a network issue?
> >
> >It cannot.
> 
> How do people then actually solve the problem of two node metro cluster?

That depends, but if you have a communication channel for stonith
devices which is _independent_ of the cluster communication then
you should be OK. Of course, a fencing device which goes down
together with its node is of no use, but that doesn't seem to be
the case here.

> I mean, I know one option: stonith-enabled=false, but it doesn't seem right
> for me.

Certainly not.

Thanks,

Dejan

> 
> Thank you.
> 
> Regards,
> Alexander Markov
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




More information about the Users mailing list