[ClusterLabs] Corosync/Pacemaker bug methinks! (Was: pacemaker won't start because duplicate node but can't remove dupe node because pacemaker won't start)

Ken Gaillot kgaillot at redhat.com
Wed Dec 18 16:24:15 EST 2019


On Wed, 2019-12-18 at 12:21 -0800, JC wrote:
> Adding logs (minus time stamps)
> 
>     info: crm_log_init: Changed active directory to
> /var/lib/pacemaker/cores
>     info: get_cluster_type:     Detected an active 'corosync' cluster
>     info: qb_ipcs_us_publish:   server name: pacemakerd
>     info: pcmk__ipc_is_authentic_process_active:        Could not
> connect to lrmd IPC: Connection refused
>     info: pcmk__ipc_is_authentic_process_active:        Could not
> connect to cib_ro IPC: Connection refused
>     info: pcmk__ipc_is_authentic_process_active:        Could not
> connect to crmd IPC: Connection refused
>     info: pcmk__ipc_is_authentic_process_active:        Could not
> connect to attrd IPC: Connection refused
>     info: pcmk__ipc_is_authentic_process_active:        Could not
> connect to pengine IPC: Connection refused
>     info: pcmk__ipc_is_authentic_process_active:        Could not
> connect to stonith-ng IPC: Connection refused
>     info: corosync_node_name:   Unable to get node name for nodeid
> 1084777441
>   notice: get_node_name:        Could not obtain a node name for
> corosync nodeid 1084777441

This ID appears to be coming from corosync. You have only to_syslog
turned on in corosync.conf, so look in the system log around this same
time to see what corosync is thinking. It does seem odd; I wonder if --
purge is missing something.

BTW you don't need bindnetaddr to be different for each host; it's the
network address (e.g. the .0 for a /24), not the host address.

>     info: crm_get_peer: Created entry ea4ec23e-e676-4798-9b8b-
> 00af39d3bb3d/0x5555f74984d0 for node (null)/1084777441 (1 total)
>     info: crm_get_peer: Node 1084777441 has uuid 1084777441
>     info: crm_update_peer_proc: cluster_connect_cpg: Node
> (null)[1084777441] - corosync-cpg is now online
>   notice: cluster_connect_quorum:       Quorum acquired
>     info: crm_get_peer: Created entry 882c0feb-d546-44b7-955f-
> 4c8a844a0db1/0x5555f7499fd0 for node postgres-sb/3 (2 total)
>     info: crm_get_peer: Node 3 is now known as postgres-sb
>     info: crm_get_peer: Node 3 has uuid 3
>     info: crm_get_peer: Created entry 4e6a6b1e-d687-4527-bffc-
> 5d701ff60a66/0x5555f749a6f0 for node region-ctrl-2/2 (3 total)
>     info: crm_get_peer: Node 2 is now known as region-ctrl-2
>     info: crm_get_peer: Node 2 has uuid 2
>     info: crm_get_peer: Created entry 5532a3cc-2577-4764-b9ee-
> 770d437ccec0/0x5555f749a0a0 for node region-ctrl-1/1 (4 total)
>     info: crm_get_peer: Node 1 is now known as region-ctrl-1
>     info: crm_get_peer: Node 1 has uuid 1
>     info: corosync_node_name:   Unable to get node name for nodeid
> 1084777441
>   notice: get_node_name:        Defaulting to uname -n for the local
> corosync node name
>  warning: crm_find_peer:        Node 1084777441 and 2 share the same
> name: 'region-ctrl-2'
>     info: crm_get_peer: Node 1084777441 is now known as region-ctrl-2
>     info: pcmk_quorum_notification:     Quorum retained |
> membership=32 members=3
>   notice: crm_update_peer_state_iter:   Node region-ctrl-1 state is
> now member | nodeid=1 previous=unknown
> source=pcmk_quorum_notification
>   notice: crm_update_peer_state_iter:   Node postgres-sb state is now
> member | nodeid=3 previous=unknown source=pcmk_quorum_notification
>   notice: crm_update_peer_state_iter:   Node region-ctrl-2 state is
> now member | nodeid=1084777441 previous=unknown
> source=pcmk_quorum_notification
>     info: crm_reap_unseen_nodes:        State of node region-ctrl-
> 2[2] is still unknown
>     info: pcmk_cpg_membership:  Node 1084777441 joined group
> pacemakerd (counter=0.0, pid=32765, unchecked for rivals)
>     info: pcmk_cpg_membership:  Node 1 still member of group
> pacemakerd (peer=region-ctrl-1:900, counter=0.0, at least once)
>     info: crm_update_peer_proc: pcmk_cpg_membership: Node region-
> ctrl-1[1] - corosync-cpg is now online
>     info: pcmk_cpg_membership:  Node 3 still member of group
> pacemakerd (peer=postgres-sb:976, counter=0.1, at least once)
>     info: crm_update_peer_proc: pcmk_cpg_membership: Node postgres-
> sb[3] - corosync-cpg is now online
>     info: pcmk_cpg_membership:  Node 1084777441 still member of group
> pacemakerd (peer=region-ctrl-2:3016, counter=0.2, at least once)
>   pengine:     info: crm_log_init:      Changed active directory to
> /var/lib/pacemaker/cores
>      lrmd:     info: crm_log_init:      Changed active directory to
> /var/lib/pacemaker/cores
>      lrmd:     info: qb_ipcs_us_publish:        server name: lrmd
>   pengine:     info: qb_ipcs_us_publish:        server name: pengine
>       cib:     info: crm_log_init:      Changed active directory to
> /var/lib/pacemaker/cores
>     attrd:     info: crm_log_init:      Changed active directory to
> /var/lib/pacemaker/cores
>     attrd:     info: get_cluster_type:  Verifying cluster type:
> 'corosync'
>     attrd:     info: get_cluster_type:  Assuming an active 'corosync'
> cluster
>     info: crm_log_init: Changed active directory to
> /var/lib/pacemaker/cores
>     attrd:   notice: crm_cluster_connect:       Connecting to cluster
> infrastructure: corosync
>       cib:     info: get_cluster_type:  Verifying cluster type:
> 'corosync'
>       cib:     info: get_cluster_type:  Assuming an active 'corosync'
> cluster
>     info: get_cluster_type:     Verifying cluster type: 'corosync'
>     info: get_cluster_type:     Assuming an active 'corosync' cluster
>   notice: crm_cluster_connect:  Connecting to cluster infrastructure:
> corosync
>     attrd:     info: corosync_node_name:        Unable to get node
> name for nodeid 1084777441
>       cib:     info: validate_with_relaxng:     Creating RNG parser
> context
>      crmd:     info: crm_log_init:      Changed active directory to
> /var/lib/pacemaker/cores
>      crmd:     info: get_cluster_type:  Verifying cluster type:
> 'corosync'
>      crmd:     info: get_cluster_type:  Assuming an active 'corosync'
> cluster
>      crmd:     info: do_log:    Input I_STARTUP received in state
> S_STARTING from crmd_init
>     attrd:   notice: get_node_name:     Could not obtain a node name
> for corosync nodeid 1084777441
>     attrd:     info: crm_get_peer:      Created entry af5c62c9-21c5-
> 4428-9504-ea72a92de7eb/0x560870420e90 for node (null)/1084777441 (1
> total)
>     attrd:     info: crm_get_peer:      Node 1084777441 has uuid
> 1084777441
>     attrd:     info: crm_update_peer_proc:      cluster_connect_cpg:
> Node (null)[1084777441] - corosync-cpg is now online
>     attrd:   notice: crm_update_peer_state_iter:        Node (null)
> state is now member | nodeid=1084777441 previous=unknown
> source=crm_update_peer_proc
>     attrd:     info: init_cs_connection_once:   Connection to
> 'corosync': established
>     info: corosync_node_name:   Unable to get node name for nodeid
> 1084777441
>   notice: get_node_name:        Could not obtain a node name for
> corosync nodeid 1084777441
>     info: crm_get_peer: Created entry 5bcb51ae-0015-4652-b036-
> b92cf4f1d990/0x55f583634700 for node (null)/1084777441 (1 total)
>     info: crm_get_peer: Node 1084777441 has uuid 1084777441
>     info: crm_update_peer_proc: cluster_connect_cpg: Node
> (null)[1084777441] - corosync-cpg is now online
>   notice: crm_update_peer_state_iter:   Node (null) state is now
> member | nodeid=1084777441 previous=unknown
> source=crm_update_peer_proc
>     attrd:     info: corosync_node_name:        Unable to get node
> name for nodeid 1084777441
>     attrd:   notice: get_node_name:     Defaulting to uname -n for
> the local corosync node name
>     attrd:     info: crm_get_peer:      Node 1084777441 is now known
> as region-ctrl-2
>     info: corosync_node_name:   Unable to get node name for nodeid
> 1084777441
>   notice: get_node_name:        Defaulting to uname -n for the local
> corosync node name
>     info: init_cs_connection_once:      Connection to 'corosync':
> established
>     info: corosync_node_name:   Unable to get node name for nodeid
> 1084777441
>   notice: get_node_name:        Defaulting to uname -n for the local
> corosync node name
>     info: crm_get_peer: Node 1084777441 is now known as region-ctrl-2
>       cib:   notice: crm_cluster_connect:       Connecting to cluster
> infrastructure: corosync
>       cib:     info: corosync_node_name:        Unable to get node
> name for nodeid 1084777441
>       cib:   notice: get_node_name:     Could not obtain a node name
> for corosync nodeid 1084777441
>       cib:     info: crm_get_peer:      Created entry a6ced2c1-9d51-
> 445d-9411-2fb19deab861/0x55848365a150 for node (null)/1084777441 (1
> total)
>       cib:     info: crm_get_peer:      Node 1084777441 has uuid
> 1084777441
>       cib:     info: crm_update_peer_proc:      cluster_connect_cpg:
> Node (null)[1084777441] - corosync-cpg is now online
>       cib:   notice: crm_update_peer_state_iter:        Node (null)
> state is now member | nodeid=1084777441 previous=unknown
> source=crm_update_peer_proc
>       cib:     info: init_cs_connection_once:   Connection to
> 'corosync': established
>       cib:     info: corosync_node_name:        Unable to get node
> name for nodeid 1084777441
>       cib:   notice: get_node_name:     Defaulting to uname -n for
> the local corosync node name
>       cib:     info: crm_get_peer:      Node 1084777441 is now known
> as region-ctrl-2
>       cib:     info: qb_ipcs_us_publish:        server name: cib_ro
>       cib:     info: qb_ipcs_us_publish:        server name: cib_rw
>       cib:     info: qb_ipcs_us_publish:        server name: cib_shm
>       cib:     info: pcmk_cpg_membership:       Node 1084777441
> joined group cib (counter=0.0, pid=0, unchecked for rivals)
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list