[ClusterLabs] Stonith
Alexander Markov
proforg at tic-tac.ru
Mon Mar 20 08:44:42 EDT 2017
Hello guys,
it looks like I miss something obvious, but I just don't get what has
happened.
I've got a number of stonith-enabled clusters within my big POWER boxes.
My stonith devices are two HMC (hardware management consoles) - separate
servers from IBM that can reboot separate LPARs (logical partitions)
within POWER boxes - one per every datacenter.
So my definition for stonith devices was pretty straightforward:
primitive st_dc2_hmc stonith:ibmhmc \
params ipaddr=10.1.2.9
primitive st_dc1_hmc stonith:ibmhmc \
params ipaddr=10.1.2.8
clone cl_st_dc2_hmc st_dc2_hmc
clone cl_st_dc1_hmc st_dc1_hmc
Everything was ok when we tested failover. But today upon power outage
we lost one DC completely. Shortly after that cluster just literally
hanged itself upong trying to reboot nonexistent node. No failover
occured. Nonexistent node was marked OFFLINE UNCLEAN and resources were
marked "Started UNCLEAN" on nonexistent node.
UNCLEAN seems to flag a problems with stonith configuration. So my
question is: how to avoid such behaviour?
Thank you!
--
Regards,
Alexander
More information about the Users
mailing list