[Pacemaker] Pacemaker remote nodes, naming, and attributes

Wed Jul 10 22:43:58 UTC 2013

Yes, it avoids the crashes.  Thanks!  But I am still seeing spurious VM
migrations/shutdowns when I stop/start a VM with a remote pacemaker
(similar to my last update, only no core dumped while fencing, nor indeed
does any fencing happen, even though I've now verified that fence_node
works again.

On Wed, Jul 10, 2013 at 2:12 PM, David Vossel <dvossel at redhat.com> wrote:

> ----- Original Message -----
> > From: "Lindsay Todd" <rltodd.ml1 at gmail.com>
> > To: "The Pacemaker cluster resource manager" <
> pacemaker at oss.clusterlabs.org>
> > Sent: Wednesday, July 10, 2013 12:11:00 PM
> > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes
> >
> > Hmm, I'll still submit the bug report, but it seems like crmd is dumping
> core
> > while attempting to fence a node. If I use fence_node to fence a real
> > cluster node, that also causes crmd to dump core. But apart from that, I
> > don't really see why pacemaker is trying to fence anything.
>
> This should solve the crashes you are seeing.
>
>
> https://github.com/ClusterLabs/pacemaker/commit/97dd3b05db867c4674fa4780802bba54c63bd06d
>
> -- Vossel
>
> >
> >
> > On Wed, Jul 10, 2013 at 12:42 PM, Lindsay Todd < rltodd.ml1 at gmail.com >
> > wrote:
> >
> >
> >
> > Thanks! But there is still a problem.
> >
> > I am now working from the master branch and building RPMs (well, I have
> to
> > also rebuild from the srpm to change the build number, since the RPMs
> built
> > directly are always 1.1.10-1). The patch is in the git log, and indeed
> > things are better ... But I still see the spurious VMs shutting down.
> What
> > is much improved is that they do get restarted, and basically I end up in
> > the state I want to be. Can almost live with this, and I was going to
> start
> > changing my cluster config to be asymmetric when I noticed the in the
> midst
> > of the spurious transitions, crmd is dumping core.
> >
> > So I'll append another crm_report to bug 5164, as well as a gdb
> traceback.
> >
> >
> > On Fri, Jul 5, 2013 at 5:06 PM, David Vossel < dvossel at redhat.com >
> wrote:
> >
> >
> >
> > ----- Original Message -----
> > > From: "David Vossel" < dvossel at redhat.com >
> > > To: "The Pacemaker cluster resource manager" <
> > > pacemaker at oss.clusterlabs.org >
> > > Sent: Wednesday, July 3, 2013 4:20:37 PM
> > > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes
> > >
> > > ----- Original Message -----
> > > > From: "Lindsay Todd" < rltodd.ml1 at gmail.com >
> > > > To: "The Pacemaker cluster resource manager"
> > > > < pacemaker at oss.clusterlabs.org >
> > > > Sent: Wednesday, July 3, 2013 2:12:05 PM
> > > > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and
> attributes
> > > >
> > > > Well, I'm not getting failures right now simply with attributes, but
> I
> > > > can
> > > > induce a failure by stopping the vm-db02 (it puts db02 into an
> unclean
> > > > state, and attempts to migrate the unrelated vm-compute-test). I've
> > > > collected the commands from my latest interactions, a crm_report,
> and a
> > > > gdb
> > > > traceback from the core file that crmd dumped, into bug 5164.
> > >
> > >
> > > Thanks, hopefully I can start investigating this Friday
> > >
> > > -- Vossel
> >
> > Yeah, this is a bad one. Adding the node attributes using crm_attribute
> for
> > the remote-node did some unexpected things to the crmd component. Somehow
> > the remote-node was getting entered into the cluster node cache... which
> > made it look like we had both a cluster-node and remote-node named the
> same
> > thing... not good.
> >
> > I think I got that part worked out. Try this patch.
> >
> >
> https://github.com/ClusterLabs/pacemaker/commit/67dfff76d632f1796c9ded8fd367aa49258c8c32
> >
> > Rather than trying to patch RCs, it might be worth trying out the master
> > branch on github (which already has this patch). If you aren't already,
> use
> > rpms to make your life easier. Running 'make rpm' in the source directory
> > will generate them for you.
> >
> > There was another bug fixed recently in pacemaker_remote involving the
> > directory created for resource agents to store their temporary data
> (stuff
> > like pid files). I believe the fix was not introduced until 1.1.10rc6.
> >
> > -- Vossel
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20130710/d7c9598c/attachment.htm>