[Pacemaker] Removed nodes showing back in status

Mon May 14 17:59:55 EDT 2012

On Mon, May 14, 2012 at 2:13 PM, David Vossel <dvossel at redhat.com> wrote:
> ----- Original Message -----
>> From: "Larry Brigman" <larry.brigman at gmail.com>
>> To: "The Pacemaker cluster resource manager" <pacemaker at oss.clusterlabs.org>
>> Sent: Monday, May 14, 2012 1:30:22 PM
>> Subject: Re: [Pacemaker] Removed nodes showing back in status
>>
>> On Mon, May 14, 2012 at 9:54 AM, Larry Brigman
>> <larry.brigman at gmail.com> wrote:
>> > I have a 5 node cluster (but it could be any number of nodes, 3 or
>> > larger).
>> > I am testing some scripts for node removal.
>> > I remove a node from the cluster and everything looks correct from
>> > crm
>> > status standpoint.
>> > When I remove a second node, the first node that was removed now
>> > shows back
>> > in the crm status as off-line.  I'm following the guidelines
>> > provided
>> > in Pacemaker Explained docs.
>> > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-delete.html
>> >
>> > I believe this is a bug but want to put it out to the list to be
>> > sure.
>> > Versions.
>> > RHEL5.7 x86_64
>> > corosync-1.4.2
>> > openais-1.1.3
>> > pacemaker-1.1.5
>> >
>> > Status after first node removed
>> > [root at portland-3 ~]# crm status
>> > ============
>> > Last updated: Mon May 14 08:42:04 2012
>> > Stack: openais
>> > Current DC: portland-1 - partition with quorum
>> > Version: 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
>> > 4 Nodes configured, 4 expected votes
>> > 0 Resources configured.
>> > ============
>> >
>> > Online: [ portland-1 portland-2 portland-3 portland-4 ]
>> >
>> > Status after second node removed.
>> > [root at portland-3 ~]# crm status
>> > ============
>> > Last updated: Mon May 14 08:42:45 2012
>> > Stack: openais
>> > Current DC: portland-1 - partition with quorum
>> > Version: 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
>> > 4 Nodes configured, 3 expected votes
>> > 0 Resources configured.
>> > ============
>> >
>> > Online: [ portland-1 portland-3 portland-4 ]
>> > OFFLINE: [ portland-5 ]
>> >
>> > Both nodes were removed from the cluster from node 1.
>>
>> When I added a node back into the cluster the second node
>> that was removed now shows as offline.
>
> The only time I've seen this sort of behavior is when I don't completely shutdown corosync and pacemaker on the node I'm removing before I delete it's configuration from the cib.  Are you sure corosync and pacemaker are gone before you delete the node from the cluster config?

Well, I run service pacemaker stop and service corosync stop prior to doing
the remove.  Since I am doing it all in a script it's possible that there
is a race condition that I have just expose or the services are not fully down
when the service script exits.

BTW, I'm running pacemaker as it's own process instead of being a child of
corosync (if that makes a difference).

[root at portland-3 ~]# cat /etc/corosync/service.d/pcmk
service {
        # Load the Pacemaker Cluster Resource Manager
        ver:  1
        name: pacemaker
#        use_mgmtd: yes
#        use_logd:  yes
}

It looks from corosync a removed node and a down node have the same object
state.
4.0.0.2 is removed 4.0.0.5 is stopped.

[root at portland-3 ~]# corosync-objctl -a | grep member
runtime.totem.pg.mrp.srp.members.16777220.ip=r(0) ip(4.0.0.1)
runtime.totem.pg.mrp.srp.members.16777220.join_count=1
runtime.totem.pg.mrp.srp.members.16777220.status=joined
runtime.totem.pg.mrp.srp.members.50331652.ip=r(0) ip(4.0.0.3)
runtime.totem.pg.mrp.srp.members.50331652.join_count=1
runtime.totem.pg.mrp.srp.members.50331652.status=joined
runtime.totem.pg.mrp.srp.members.67108868.ip=r(0) ip(4.0.0.4)
runtime.totem.pg.mrp.srp.members.67108868.join_count=3
runtime.totem.pg.mrp.srp.members.67108868.status=joined
runtime.totem.pg.mrp.srp.members.83886084.ip=r(0) ip(4.0.0.5)
runtime.totem.pg.mrp.srp.members.83886084.join_count=4
runtime.totem.pg.mrp.srp.members.83886084.status=joined
runtime.totem.pg.mrp.srp.members.33554436.ip=r(0) ip(4.0.0.2)
runtime.totem.pg.mrp.srp.members.33554436.join_count=1
runtime.totem.pg.mrp.srp.members.33554436.status=left

>
> -- Vossel
>
>> [root at portland-3 ~]# crm status
>> ============
>> Last updated: Mon May 14 11:27:55 2012
>> Stack: openais
>> Current DC: portland-1 - partition with quorum
>> Version: 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
>> 5 Nodes configured, 4 expected votes
>> 0 Resources configured.
>> ============
>>
>> Online: [ portland-1 portland-3 portland-4 portland-5 ]
>> OFFLINE: [ portland-2 ]
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org