[Pacemaker] Removed nodes showing back in status

Wed May 16 16:53:55 EDT 2012

----- Original Message -----
> From: "Larry Brigman" <larry.brigman at gmail.com>
> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
> Sent: Monday, May 14, 2012 4:59:55 PM
> Subject: Re: [Pacemaker] Removed nodes showing back in status
> 
> On Mon, May 14, 2012 at 2:13 PM, David Vossel <dvossel at redhat.com>
> wrote:
> > ----- Original Message -----
> >> From: "Larry Brigman" <larry.brigman at gmail.com>
> >> To: "The Pacemaker cluster resource manager"
> >> <pacemaker at oss.clusterlabs.org>
> >> Sent: Monday, May 14, 2012 1:30:22 PM
> >> Subject: Re: [Pacemaker] Removed nodes showing back in status
> >>
> >> On Mon, May 14, 2012 at 9:54 AM, Larry Brigman
> >> <larry.brigman at gmail.com> wrote:
> >> > I have a 5 node cluster (but it could be any number of nodes, 3
> >> > or
> >> > larger).
> >> > I am testing some scripts for node removal.
> >> > I remove a node from the cluster and everything looks correct
> >> > from
> >> > crm
> >> > status standpoint.
> >> > When I remove a second node, the first node that was removed now
> >> > shows back
> >> > in the crm status as off-line.  I'm following the guidelines
> >> > provided
> >> > in Pacemaker Explained docs.
> >> > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-delete.html
> >> >
> >> > I believe this is a bug but want to put it out to the list to be
> >> > sure.
> >> > Versions.
> >> > RHEL5.7 x86_64
> >> > corosync-1.4.2
> >> > openais-1.1.3
> >> > pacemaker-1.1.5
> >> >
> >> > Status after first node removed
> >> > [root at portland-3 ~]# crm status
> >> > ============
> >> > Last updated: Mon May 14 08:42:04 2012
> >> > Stack: openais
> >> > Current DC: portland-1 - partition with quorum
> >> > Version: 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
> >> > 4 Nodes configured, 4 expected votes
> >> > 0 Resources configured.
> >> > ============
> >> >
> >> > Online: [ portland-1 portland-2 portland-3 portland-4 ]
> >> >
> >> > Status after second node removed.
> >> > [root at portland-3 ~]# crm status
> >> > ============
> >> > Last updated: Mon May 14 08:42:45 2012
> >> > Stack: openais
> >> > Current DC: portland-1 - partition with quorum
> >> > Version: 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
> >> > 4 Nodes configured, 3 expected votes
> >> > 0 Resources configured.
> >> > ============
> >> >
> >> > Online: [ portland-1 portland-3 portland-4 ]
> >> > OFFLINE: [ portland-5 ]
> >> >
> >> > Both nodes were removed from the cluster from node 1.
> >>
> >> When I added a node back into the cluster the second node
> >> that was removed now shows as offline.
> >
> > The only time I've seen this sort of behavior is when I don't
> > completely shutdown corosync and pacemaker on the node I'm
> > removing before I delete it's configuration from the cib.  Are you
> > sure corosync and pacemaker are gone before you delete the node
> > from the cluster config?
> 
> Well, I run service pacemaker stop and service corosync stop prior to
> doing
> the remove.  Since I am doing it all in a script it's possible that
> there
> is a race condition that I have just expose or the services are not
> fully down
> when the service script exits.

Yep, If you are waiting for the service scripts to return I would expect it to be safe to remove the nodes at that point.

> BTW, I'm running pacemaker as it's own process instead of being a
> child of
> corosync (if that makes a difference).
>

This shouldn't matter.

An hb_report of this will help us distinguish if this is a bug or not.

-- Vossel

> [root at portland-3 ~]# cat /etc/corosync/service.d/pcmk
> service {
>         # Load the Pacemaker Cluster Resource Manager
>         ver:  1
>         name: pacemaker
> #        use_mgmtd: yes
> #        use_logd:  yes
> }
> 
> It looks from corosync a removed node and a down node have the same
> object
> state.
> 4.0.0.2 is removed 4.0.0.5 is stopped.
> 
> [root at portland-3 ~]# corosync-objctl -a | grep member
> runtime.totem.pg.mrp.srp.members.16777220.ip=r(0) ip(4.0.0.1)
> runtime.totem.pg.mrp.srp.members.16777220.join_count=1
> runtime.totem.pg.mrp.srp.members.16777220.status=joined
> runtime.totem.pg.mrp.srp.members.50331652.ip=r(0) ip(4.0.0.3)
> runtime.totem.pg.mrp.srp.members.50331652.join_count=1
> runtime.totem.pg.mrp.srp.members.50331652.status=joined
> runtime.totem.pg.mrp.srp.members.67108868.ip=r(0) ip(4.0.0.4)
> runtime.totem.pg.mrp.srp.members.67108868.join_count=3
> runtime.totem.pg.mrp.srp.members.67108868.status=joined
> runtime.totem.pg.mrp.srp.members.83886084.ip=r(0) ip(4.0.0.5)
> runtime.totem.pg.mrp.srp.members.83886084.join_count=4
> runtime.totem.pg.mrp.srp.members.83886084.status=joined
> runtime.totem.pg.mrp.srp.members.33554436.ip=r(0) ip(4.0.0.2)
> runtime.totem.pg.mrp.srp.members.33554436.join_count=1
> runtime.totem.pg.mrp.srp.members.33554436.status=left
> 
> >
> > -- Vossel
> >
> >> [root at portland-3 ~]# crm status
> >> ============
> >> Last updated: Mon May 14 11:27:55 2012
> >> Stack: openais
> >> Current DC: portland-1 - partition with quorum
> >> Version: 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
> >> 5 Nodes configured, 4 expected votes
> >> 0 Resources configured.
> >> ============
> >>
> >> Online: [ portland-1 portland-3 portland-4 portland-5 ]
> >> OFFLINE: [ portland-2 ]
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>