<div dir="ltr">Hmm, I'll still submit the bug report, but it seems like crmd is dumping core while attempting to fence a node. If I use fence_node to fence a real cluster node, that also causes crmd to dump core. But apart from that, I don't really see why pacemaker is trying to fence anything.</div>
<div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jul 10, 2013 at 12:42 PM, Lindsay Todd <span dir="ltr"><<a href="mailto:rltodd.ml1@gmail.com" target="_blank">rltodd.ml1@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thanks! But there is still a problem.<div><br></div><div>I am now working from the master branch and building RPMs (well, I have to also rebuild from the srpm to change the build number, since the RPMs built directly are always 1.1.10-1). The patch is in the git log, and indeed things are better ... But I still see the spurious VMs shutting down. What is much improved is that they do get restarted, and basically I end up in the state I want to be. Can almost live with this, and I was going to start changing my cluster config to be asymmetric when I noticed the in the midst of the spurious transitions, crmd is dumping core.</div>
<div><br></div><div>So I'll append another crm_report to bug 5164, as well as a gdb traceback.</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Jul 5, 2013 at 5:06 PM, David Vossel <span dir="ltr"><<a href="mailto:dvossel@redhat.com" target="_blank">dvossel@redhat.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>----- Original Message -----<br>
> From: "David Vossel" <<a href="mailto:dvossel@redhat.com" target="_blank">dvossel@redhat.com</a>><br>
> To: "The Pacemaker cluster resource manager" <<a href="mailto:pacemaker@oss.clusterlabs.org" target="_blank">pacemaker@oss.clusterlabs.org</a>><br>
</div><div>> Sent: Wednesday, July 3, 2013 4:20:37 PM<br>
> Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes<br>
><br>
> ----- Original Message -----<br>
> > From: "Lindsay Todd" <<a href="mailto:rltodd.ml1@gmail.com" target="_blank">rltodd.ml1@gmail.com</a>><br>
> > To: "The Pacemaker cluster resource manager"<br>
</div><div>> > <<a href="mailto:pacemaker@oss.clusterlabs.org" target="_blank">pacemaker@oss.clusterlabs.org</a>><br>
> > Sent: Wednesday, July 3, 2013 2:12:05 PM<br>
> > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes<br>
> ><br>
> > Well, I'm not getting failures right now simply with attributes, but I can<br>
> > induce a failure by stopping the vm-db02 (it puts db02 into an unclean<br>
> > state, and attempts to migrate the unrelated vm-compute-test). I've<br>
> > collected the commands from my latest interactions, a crm_report, and a gdb<br>
> > traceback from the core file that crmd dumped, into bug 5164.<br>
><br>
><br>
> Thanks, hopefully I can start investigating this Friday<br>
><br>
> -- Vossel<br>
<br>
</div>Yeah, this is a bad one. Adding the node attributes using crm_attribute for the remote-node did some unexpected things to the crmd component. Somehow the remote-node was getting entered into the cluster node cache... which made it look like we had both a cluster-node and remote-node named the same thing... not good.<br>
<br>
I think I got that part worked out. Try this patch.<br>
<br>
<a href="https://github.com/ClusterLabs/pacemaker/commit/67dfff76d632f1796c9ded8fd367aa49258c8c32" target="_blank">https://github.com/ClusterLabs/pacemaker/commit/67dfff76d632f1796c9ded8fd367aa49258c8c32</a><br>
<br>
Rather than trying to patch RCs, it might be worth trying out the master branch on github (which already has this patch). If you aren't already, use rpms to make your life easier. Running 'make rpm' in the source directory will generate them for you.<br>
<br>
There was another bug fixed recently in pacemaker_remote involving the directory created for resource agents to store their temporary data (stuff like pid files). I believe the fix was not introduced until 1.1.10rc6.<br>
<div><div><br>
-- Vossel<br>
<br>
<br>
_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>