[ClusterLabs] Stonith

Kristoffer Grönlund kgronlund at suse.com
Thu Mar 30 04:15:23 EDT 2017


Alexander Markov <proforg at tic-tac.ru> writes:

> Hello guys,
>
> it looks like I miss something obvious, but I just don't get what has 
> happened.
>
> I've got a number of stonith-enabled clusters within my big POWER boxes. 
> My stonith devices are two HMC (hardware management consoles) - separate 
> servers from IBM that can reboot separate LPARs (logical partitions) 
> within POWER boxes - one per every datacenter.
>
> So my definition for stonith devices was pretty straightforward:
>
> primitive st_dc2_hmc stonith:ibmhmc \
> params ipaddr=10.1.2.9
> primitive st_dc1_hmc stonith:ibmhmc \
> params ipaddr=10.1.2.8
> clone cl_st_dc2_hmc st_dc2_hmc
> clone cl_st_dc1_hmc st_dc1_hmc
>
> Everything was ok when we tested failover. But today upon power outage 

Did you test failover through pacemaker itself?

Otherwise, the logs for the attempted stonith should reveal more about
how Pacemaker tried to call the stonith device, and what went wrong.

However: Am I understanding it correctly that you have one node in each
data center, and a stonith device in each data center? That doesn't
sound like a setup that can recover from data center failure: If the
data center is lost, the stonith device for the node in that data center
would also be lost and thus not able to fence.

In such a hardware configuration, only a poison pill solution like SBD
could work, I think.

Cheers,
Kristoffer

> we lost one DC completely. Shortly after that cluster just literally 
> hanged itself upong trying to reboot nonexistent node. No failover 
> occured. Nonexistent node was marked OFFLINE UNCLEAN and resources were 
> marked "Started UNCLEAN" on nonexistent node.
>
> UNCLEAN seems to flag a problems with stonith configuration. So my 
> question is: how to avoid such behaviour?
>
> Thank you!
>
> -- 
> Regards,
> Alexander
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronlund at suse.com




More information about the Users mailing list