[ClusterLabs] Node stuck in state pending

Mon Mar 30 00:22:08 UTC 2015

> On 19 Mar 2015, at 9:06 am, Michael Schwartzkopff <ms at sys4.de> wrote:
> 
> Hi,
> 
> I have a cluster of four nodes where all nodes are stuck in state "pending". 
> 
> Two nodes had a problem and were fenced successfully. To add the two nodes 
> again, the admin set the cluster maintenence-mode="true".
> 
> After that all four nodes are stuck in state "pending". On the two surviving 
> nodes, all resources run and the node_state is:
> 
> <node_state id="node01" uname="node01" ha="active" in_ccm="true" crmd="online" 
> join="pending" expected="member" crm-debug-origin="do_cib_replaced" 
> shutdown="0">
> 
> On the two nodes, that were fenced, the node_state looks like:
> 
> <node_state id="node04" uname="node04" ha="active" in_ccm="true" crmd="online" 
> join="pending" expected="down" crm-debug-origin="do_cib_replaced" 
> shutdown="0"/>
> 
> There are no transient_attributes for the two fenced nodes.
> 
> crm node clearstate 
> 
> results in:
> 
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: - <cib num_updates="107" >
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -   <status >
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -     <node_state 
> id="node04" >
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -       
> <transient_attributes id="node04" __crm_diff_marker__="removed:top" >
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -         
> <instance_attributes id="status-node04" >
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -           <nvpair 
> id="status-node04-terminate" name="terminate" value="true" />
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -         
> </instance_attributes>
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -       
> </transient_attributes>
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -     </node_state>
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: -   </status>
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: - </cib>
> Mar 18 22:43:39 node02 cib: [24878]: info: cib:diff: + <cib validate-
> with="pacemaker-1.2" crm_feature_set="3.0.6" have-quorum="1" admin_epoch="0" 
> epoch="1467" num_updates="108" cib-last-written="Wed Mar 18 21:47:45 2015" 
> update-origin="node01" update-client="cibadmin" update-user="root" dc-
> uuid="node02" />
> Mar 18 22:43:39 node02 cib: [24878]: info: cib_process_request: Operation 
> complete: op cib_modify for section nodes (origin=local/crmd/1508, 
> version=0.1467.109): ok (rc=0)
> 
> And the node04 remains in "pending" state. In corosync-objctl all nodes show 
> up as "joined", so they see each others.
> 
> corosync 1.4.0
> pacemaker 1.1.7
> 
> Any idea how to resolve the issue? Thanks for any hints.

Smells like an old membership bug.
Restart pacemaker everywhere? You already have maintenence-mode="true" so resources shouldn't be affected

> 
> Mit freundlichen Grüßen,
> 
> Michael Schwartzkopff
> 
> -- 
> [*] sys4 AG
> 
> http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
> Franziskanerstraße 15, 81669 München
> 
> Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
> Vorstand: Patrick Ben Koetter, Marc Schiffbauer
> Aufsichtsratsvorsitzender: Florian Kirstein_______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org