<div dir="ltr">Ok, then. I learned something new. Thanks.<br><br>d.p.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Mar 28, 2013 at 6:28 PM, Andrew Beekhof <span dir="ltr"><<a href="mailto:andrew@beekhof.net" target="_blank">andrew@beekhof.net</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Fri, Mar 29, 2013 at 7:42 AM, David Pendell <<a href="mailto:lostogre@gmail.com">lostogre@gmail.com</a>> wrote:<br>


> I have a two-node CentOS 6.4 based cluster, using pacemaker 1.1.8 with a<br>

> cman backend running primarily libvirt controlled kvm VMs. For the VMs, I am<br>

> using clvm volumes for the virtual hard drives and a single gfs2 volume for<br>

> shared storage of the config files for the VMs and other shared data. For<br>

> fencing, I use ipmi and a apc master switch to provide redundant fencing.<br>

> There are location constraints that do not allow the fencing resources run<br>

> on their own node. I am *not* using sbd or any other software based fencing<br>

> device.<br>

><br>

> I had a very bizarre situation this morning -- I had one of the nodes<br>

> powered off. Then the other self-fenced. I thought that was impossible.<br>

<br>

</div>No. Not when a node is by itself.<br>

<div><div class="h5"><br>

><br>

> Excerpts from the logs:<br>

><br>

> Mar 28 13:10:01 virtualhost2 stonith-ng[4223]:   notice: remote_op_done:<br>

> Operation reboot of <a href="http://virtualhost2.delta-co.gov" target="_blank">virtualhost2.delta-co.gov</a> by<br>

> <a href="http://virtualhost1.delta-co.gov" target="_blank">virtualhost1.delta-co.gov</a> for crmd.4430@virtualhost1.delta-co.gov.fc5638ad:<br>

> Timer expired<br>

><br>

> [...]<br>

> Virtualhost1 was offline, so I expect that line.<br>

> [...]<br>

><br>

> Mar 28 13:13:30 virtualhost2 pengine[4226]:   notice: unpack_rsc_op:<br>

> Preventing p_ns2 from re-starting on <a href="http://virtualhost2.delta-co.gov" target="_blank">virtualhost2.delta-co.gov</a>: operation<br>

> monitor failed 'not installed' (rc=5)<br>

><br>

> [...]<br>

> If I had a brief interruption of my gfs2 volume, would that show up? And<br>

> would it be the cause of a fencing operation?<br>

> [...]<br>

><br>

> Mar 28 13:13:30 virtualhost2 pengine[4226]:  warning: pe_fence_node: Node<br>

> <a href="http://virtualhost2.delta-co.gov" target="_blank">virtualhost2.delta-co.gov</a> will be fenced to recover from resource failure(s)<br>

> Mar 28 13:13:30 virtualhost2 pengine[4226]:  warning: stage6: Scheduling<br>

> Node <a href="http://virtualhost2.delta-co.gov" target="_blank">virtualhost2.delta-co.gov</a> for STONITH<br>

><br>

> [...]<br>

> Why is it still trying to fence, if all of the fencing resources are<br>

> offline?<br>

> [...]<br>

><br>

> Mar 28 13:13:30 virtualhost2 crmd[4227]:   notice: te_fence_node: Executing<br>

> reboot fencing operation (43) on <a href="http://virtualhost2.delta-co.gov" target="_blank">virtualhost2.delta-co.gov</a> (timeout=60000)<br>

><br>

> Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice: handle_request:<br>

> Client crmd.4227.9fdec3bd wants to fence (reboot)<br>

> '<a href="http://virtualhost2.delta-co.gov" target="_blank">virtualhost2.delta-co.gov</a>' with device '(any)'<br>

><br>

> [...]<br>

> What does that mean? crmd.4227.9fdec3bd  I figure 4227 is a process number,<br>

> but I don't what the next number is.<br>

> [...]<br>

><br>

> Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:    error:<br>

> check_alternate_host: No alternate host available to handle complex self<br>

> fencing request<br>

><br>

> [...]<br>

> Where did that come from?<br>

<br>

</div></div>It was scheduled by the policy engine (because a resource failed to<br>

stop by the looks of it) and, as per the logs above, initiated by the<br>

crmd.<br>

<div class="im"><br>

> [...]<br>

><br>

> Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice:<br>

> check_alternate_host: Peer[1] <a href="http://virtualhost1.delta-co.gov" target="_blank">virtualhost1.delta-co.gov</a><br>

> Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice:<br>

> check_alternate_host: Peer[2] <a href="http://virtualhost2.delta-co.gov" target="_blank">virtualhost2.delta-co.gov</a><br>

> Mar 28 13:13:30 virtualhost2 stonith-ng[4223]:   notice:<br>

> initiate_remote_stonith_op: Initiating remote operation reboot for<br>

> <a href="http://virtualhost2.delta-co.gov" target="_blank">virtualhost2.delta-co.gov</a>: 648ca743-6cda-4c81-9250-21c9109a51b9 (0)<br>

><br>

> [...]<br>

> The next logs are the reboot logs.<br>

><br>

</div>> _______________________________________________<br>

> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

><br>

> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

><br>

<br>

_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>

</blockquote></div><br></div>