[ClusterLabs] stonith in dual HMC environment

Digimer lists at alteeve.ca
Tue Mar 21 03:09:19 EDT 2017


On 20/03/17 12:22 PM, Alexander Markov wrote:
> Hello guys,
> 
> it looks like I miss something obvious, but I just don't get what has
> happened.
> 
> I've got a number of stonith-enabled clusters within my big POWER boxes.
> My stonith devices are two HMC (hardware management consoles) - separate
> servers from IBM that can reboot separate LPARs (logical partitions)
> within POWER boxes - one per every datacenter.
> 
> So my definition for stonith devices was pretty straightforward:
> 
> primitive st_dc2_hmc stonith:ibmhmc \
> params ipaddr=10.1.2.9
> primitive st_dc1_hmc stonith:ibmhmc \
> params ipaddr=10.1.2.8
> clone cl_st_dc2_hmc st_dc2_hmc
> clone cl_st_dc1_hmc st_dc1_hmc
> 
> Everything was ok when we tested failover. But today upon power outage
> we lost one DC completely. Shortly after that cluster just literally
> hanged itself upong trying to reboot nonexistent node. No failover
> occured. Nonexistent node was marked OFFLINE UNCLEAN and resources were
> marked "Started UNCLEAN" on nonexistent node.
> 
> UNCLEAN seems to flag a problems with stonith configuration. So my
> question is: how to avoid such behaviour?
> 
> Thank you!

Please share your config along with the logs from the nodes that were
effected.

cheers,

digimer

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould




More information about the Users mailing list