[ClusterLabs] Node stuck in state pending
Andrew Beekhof
andrew at beekhof.net
Mon Mar 30 00:22:08 UTC 2015
> On 19 Mar 2015, at 9:06 am, Michael Schwartzkopff <ms at sys4.de> wrote:
>
> Hi,
>
> I have a cluster of four nodes where all nodes are stuck in state "pending".
>
> Two nodes had a problem and were fenced successfully. To add the two nodes
> again, the admin set the cluster maintenence-mode="true".
>
> After that all four nodes are stuck in state "pending". On the two surviving
> nodes, all resources run and the node_state is:
>
> <node_state id="node01" uname="node01" ha="active" in_ccm="true" crmd="online"
> join="pending" expected="member" crm-debug-origin="do_cib_replaced"
> shutdown="0">
>
> On the two nodes, that were fenced, the node_state looks like:
>
> <node_state id="node04" uname="node04" ha="active" in_ccm="true" crmd="online"
> join="pending" expected="down" crm-debug-origin="do_cib_replaced"
> shutdown="0"/>
>
> There are no transient_attributes for the two fenced nodes.
>
> crm node clearstate
>
> results in:
>
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: - <cib num_updates="107" >
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: - <status >
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: - <node_state
> id="node04" >
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -
> <transient_attributes id="node04" __crm_diff_marker__="removed:top" >
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -
> <instance_attributes id="status-node04" >
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: - <nvpair
> id="status-node04-terminate" name="terminate" value="true" />
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -
> </instance_attributes>
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -
> </transient_attributes>
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: - </node_state>
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: - </status>
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: - </cib>
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: + <cib validate-
> with="pacemaker-1.2" crm_feature_set="3.0.6" have-quorum="1" admin_epoch="0"
> epoch="1467" num_updates="108" cib-last-written="Wed Mar 18 21:47:45 2015"
> update-origin="node01" update-client="cibadmin" update-user="root" dc-
> uuid="node02" />
> Mar 18 22:43:39 node02 cib: [24878]: info: cib_process_request: Operation
> complete: op cib_modify for section nodes (origin=local/crmd/1508,
> version=0.1467.109): ok (rc=0)
>
> And the node04 remains in "pending" state. In corosync-objctl all nodes show
> up as "joined", so they see each others.
>
> corosync 1.4.0
> pacemaker 1.1.7
>
> Any idea how to resolve the issue? Thanks for any hints.
Smells like an old membership bug.
Restart pacemaker everywhere? You already have maintenence-mode="true" so resources shouldn't be affected
>
> Mit freundlichen Grüßen,
>
> Michael Schwartzkopff
>
> --
> [*] sys4 AG
>
> http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
> Franziskanerstraße 15, 81669 München
>
> Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
> Vorstand: Patrick Ben Koetter, Marc Schiffbauer
> Aufsichtsratsvorsitzender: Florian Kirstein_______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list