Devin,<br>Thanks for your support.<br><br>As I have tested, its not a problem on the shutdown order. On a regular shutdown everything is working fine until I pull the power cable. After losing the ilo communication the status of the online node changes to "online UNCLEAN". The other node which is turned off and without any power gets "offline UNCLEAN". In that situation you can't manage the resources anymore.<br>I think, that isn't the behavior of cluster system, if I power off the complete second rack, the resources get lost.<br><br>Thanks<br>Hannes<br><br><br>Devin wrote:<br><br>> You mean with corosync will work fine, because I am using heartbeat instead.<br><br>I suspect that it's a similar situation with heartbeat.  The problem is<br>pacemaker losing communication before the node cleanly disconnects.<br><br>The behavior I saw on my own clusters is that because the init script<br>values were bad, the node's network interfaces would be brought down<br>before the node had cleanly left the cluster.  Since the second node<br>didn't see a clean disconnect and couldn't contact the first node, it<br>would stonith the first node sometime after the first node's network<br>was down but before it was halted (which is pretty rude and can be<br>hard on filesystem integrity).<br><br>> The resource wouldn't be started by the other node, because it can't fence the missing node without power on ILO.<br><br>The point that I was trying to make is that your nodes shouldn't be<br>trying to fence each other unless a node is _unexpectedly_ unreachable.<br>During maintenance, which you presumably do with a controlled shutdown,<br>there should be no fencing at all because the node-going-into-maintenance<br>should first disconnect cleanly.  (Because of the bad sequencing where<br>corosync/pacemaker was shut down after the networks went down, a clean<br>disconnect wouldn't happen, and then the node would get fenced.)<br><br>For a clean shutdown, the cluster should move all resources _before_<br>the node disconnects, thus not requiring fencing in order to run them<br>on the other node.<br><br>Fencing should be an action of last choice, not the normal mode of operation.<br><br>In the case of a true hardware fault, and assuming that you're using<br>redundant power supplies fed by independent power sources, you wouldn't<br>see this behavior either unless you were dealing with multiple failures<br>(which is problematic in various ways).<br><br>So whether you're using heartbeat or corosync, I'd look at your startup<br>and shutdown sequence and ensure that during controlled operations no<br>fencing is being triggered.<br><br>(You can still test your fencing by pulling your non-ILO network<br>cables instead of pulling the power cord.)<br><br>If you're still concerned about the choice of stonith device and have<br>only one power supply, you can look at something like an APC switched PDU,<br>but I suspect that you're further ahead (for all of cost, complexity,<br>and redundancy) in using dual power supplies and the ILO.<br><br>Devin<br>-- <br>If it's sinful, it's more fun.<br><br><br><br><br>Sent from my HTC<br><br><br><br><br><br>