[ClusterLabs] node name issues (Could not obtain a node name for corosync nodeid 739512332)

Ken Gaillot kgaillot at redhat.com
Thu Aug 22 11:38:53 EDT 2019


On Thu, 2019-08-22 at 09:07 +0200, Ulrich Windl wrote:
> Hi!
> 
> When starting pacemaker (1.1.19+20181105.ccd6b5b10-3.10.1) on a node
> that had been down for a while, I noticed some unexpected messages
> about the node name:
> 
> pacemakerd:   notice: get_node_name:   Could not obtain a node name
> for corosync nodeid 739512332
> pacemakerd:     info: crm_get_peer:    Created entry a21bf687-045b-
> 4fd7-9340-0562ef595883/0x18752f0 for node (null)/739512332 (1 total)
> pacemakerd:     info: crm_get_peer:    Node 739512332 has uuid
> 739512332
> 
> Seems UUID and node ID is mixed up in the message at least...

"UUID" is a misnomer, for historical reasons. It was an actual UUID for
heartbeat (originally the only supported cluster layer), but for
corosync it's the node ID and for Pacemaker Remote nodes it's the node
name.

Ironically the string after "Created entry" is an actual UUID but
that's not the "node UUID", just an internal hash table id.

We should definitely update all those messages to reflect the current
reality.

> pacemakerd:     info: crm_update_peer_proc: cluster_connect_cpg: Node
> (null)[739512332] - corosync-cpg is now online
> pacemakerd:   notice: cluster_connect_quorum: Quorum acquired
> pacemakerd:     info: corosync_node_name: Unable to get node name for
> nodeid 739512332
> pacemakerd:   notice: get_node_name:   Defaulting to uname -n for the
> local corosync node name
> pacemakerd:     info: crm_get_peer:    Node 739512332 is now known as
> h12
> ...
> pacemakerd:     info: main:    Starting mainloop
> pacemakerd:     info: pcmk_quorum_notification:        Quorum
> retained | membership=172 members=2
> pacemakerd:     info: corosync_node_name:      Unable to get node
> name for nodeid 739512331
> pacemakerd:   notice: get_node_name:   Could not obtain a node name
> for corosync nodeid 739512331
> pacemakerd:     info: crm_get_peer:    Created entry f4ef35e4-1b49-
> 4e48-916b-bb0fab7c52c9/0x1876820 for node (null)/739512331 (2 total)
> pacemakerd:     info: crm_get_peer:    Node 739512331 has uuid
> 739512331
> ...
> pacemakerd:     info: corosync_node_name:      Unable to get node
> name for nodeid 739512331
> ...
> pacemakerd:   notice: get_node_name:   Could not obtain a node name
> for corosync nodeid 739512331
> pacemakerd:   notice: crm_update_peer_state_iter:      Node (null)
> state is now member | nodeid=739512331 previous=unknown
> source=pcmk_quorum_notification
> pacemakerd:   notice: crm_update_peer_state_iter:      Node 12 state
> is now member | nodeid=739512332 previous=unknown
> source=pcmk_quorum_notification
> pacemakerd:     info: pcmk_cpg_membership:     Node 739512332 joined
> group pacemakerd (counter=0.0, pid=32766, unchecked for rivals)
> stonith-ng:     info: corosync_node_name:      Unable to get node
> name for nodeid 739512332
> stonith-ng:   notice: get_node_name:   Could not obtain a node name
> for corosync nodeid 739512332
> 
> What's that? The ID had been resolved before!

stonith-ng is a completely different process; each daemon has to figure
out the node information itself from what corosync gives it. You'll see
a lot of such messages repeated for each daemon that uses corosync.

> 
> stonith-ng:     info: crm_get_peer:    Created entry 155a30a0-ddd3-
> 4b31-9f76-46313ffa9824/0x1bff130 for node (null)/739512332 (1 total)
> stonith-ng:     info: crm_get_peer:    Node 739512332 has uuid
> 739512332
> ...
> stonith-ng:   notice: crm_update_peer_state_iter:      Node (null)
> state is now member | nodeid=739512332 previous=unknown
> source=crm_update_peer_proc
> ...
> attrd:   notice: get_node_name:   Could not obtain a node name for
> corosync nodeid 739512332
> attrd:     info: crm_get_peer:    Created entry 961e718f-ad71-479a-
> ae04-c2ec5ba29858/0x256ca40 for node (null)/739512332 (1 total)
> attrd:     info: crm_get_peer:    Node 739512332 has uuid 739512332
> attrd:     info: crm_update_peer_proc:    cluster_connect_cpg: Node
> (null)[739512332] - corosync-cpg is now online
> attrd:   notice: crm_update_peer_state_iter:      Node (null) state
> is now member | nodeid=739512332 previous=unknown
> source=crm_update_peer_proc
> ...
> pacemakerd:   notice: get_node_name:   Could not obtain a node name
> for corosync nodeid 739512331
> pacemakerd:     info: pcmk_cpg_membership:     Node 739512331 still
> member of group pacemakerd (peer=(null):7275, counter=0.0, at least
> once)
> stonith-ng:   notice: get_node_name:   Defaulting to uname -n for the
> local corosync node name
> ...
> pacemakerd:     info: crm_get_peer:    Node 739512331 is now known as
> h11
> ...
> attrd:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512332
> attrd:   notice: get_node_name:   Defaulting to uname -n for the
> local corosync node name
> attrd:     info: crm_get_peer:    Node 739512332 is now known as h12
> stonith-ng:     info: corosync_node_name:      Unable to get node
> name for nodeid 739512332
> stonith-ng:   notice: get_node_name:   Defaulting to uname -n for the
> local corosync node name
> stonith-ng:     info: crm_get_peer:    Node 739512332 is now known as
> h12
> cib:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512332
> cib:   notice: get_node_name:   Could not obtain a node name for
> corosync nodeid 739512332
> cib:     info: crm_get_peer:    Created entry 287bf9d9-b9f7-44d5-
> 997f-89fd3ee038de/0x24d2740 for node (null)/739512332 (1 total)
> cib:     info: crm_get_peer:    Node 739512332 has uuid 739512332
> cib:     info: crm_update_peer_proc:    cluster_connect_cpg: Node
> (null)[739512332] - corosync-cpg is now online
> cib:   notice: crm_update_peer_state_iter:      Node (null) state is
> now member | nodeid=739512332 previous=unknown
> source=crm_update_peer_proc
> ...
> 
> This doesn't look right in my eyes.

Corosync by default provides only a corosync node ID when identifying
nodes. The daemons have to learn the node names from cluster messages
passed around by pacemaker. The exception is if "name:" is specified in
corosync.conf, the daemons can learn the names at start-up.

As for the "now online"/"now member", there are two stages of corosync
membership: cluster membership (i.e. participating in the corosync
token ring) and process group (CPG) membership (which is corosync's
node-to-node messaging protocol). They generally happen very close to
each other.

> 
> cib:     info: cib_init:        Starting cib mainloop
> cib:     info: pcmk_cpg_membership:     Node 739512332 joined group
> cib (counter=0.0, pid=0, unchecked for rivals)
> cib:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512331
> cib:   notice: get_node_name:   Could not obtain a node name for
> corosync nodeid 739512331
> cib:     info: crm_get_peer:    Created entry a3a97ea4-27b0-474b-
> 9052-37892bbb3eb2/0x24d3250 for node (null)/739512331 (2 total)
> cib:     info: crm_get_peer:    Node 739512331 has uuid 739512331
> cib:     info: pcmk_cpg_membership:     Node 739512331 still member
> of group cib (peer=(null):7276, counter=0.0, at least once)
> cib:     info: crm_update_peer_proc:    pcmk_cpg_membership: Node
> (null)[739512331] - corosync-cpg is now online
> cib:   notice: crm_update_peer_state_iter:      Node (null) state is
> now member | nodeid=739512331 previous=unknown
> source=crm_update_peer_proc
> cib:     info: pcmk_cpg_membership:     Node 739512332 still member
> of group cib (peer=h12:40550, counter=0.1, at least once)
> cib:     info: cib_file_backup: Archived previous version as
> /var/lib/pacemaker/cib/cib-39.raw
> cib:     info: cib_file_write_with_digest:      Wrote version 0.212.0
> of the CIB to disk (digest: 8ca1ed7121bc34a2f81c25eb952b843a)
> ...
> crmd:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512332
> crmd:   notice: get_node_name:   Could not obtain a node name for
> corosync nodeid 739512332
> crmd:     info: crm_get_peer:    Created entry 14984fcd-a050-4e09-
> 890e-6eee7be7d459/0x1d3a010 for node (null)/739512332 (1 total)
> crmd:     info: crm_get_peer:    Node 739512332 has uuid 739512332
> crmd:     info: crm_update_peer_proc:    cluster_connect_cpg: Node
> (null)[739512332] - corosync-cpg is now online
> crmd:     info: init_cs_connection_once: Connection to 'corosync':
> established
> crmd:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512332
> crmd:   notice: get_node_name:   Defaulting to uname -n for the local
> corosync node name
> crmd:     info: crm_get_peer:    Node 739512332 is now known as h12
> crmd:     info: peer_update_callback:    Cluster node h12 is now in
> unknown state
> cib:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512332
> cib:   notice: get_node_name:   Defaulting to uname -n for the local
> corosync node name
> cib:     info: crm_get_peer:    Node 739512331 is now known as h11
> ...
> crmd:   notice: cluster_connect_quorum:  Quorum acquired
> crmd:     info: do_ha_control:   Connected to the cluster
> ...
> crmd:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512331
> crmd:   notice: get_node_name:   Could not obtain a node name for
> corosync nodeid 739512331
> crmd:     info: crm_get_peer:    Created entry 0a6fdb02-7a25-4c0d-
> b496-60bb7287168e/0x1e7e500 for node (null)/739512331 (2 total)
> crmd:     info: crm_get_peer:    Node 739512331 has uuid 739512331
> crmd:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512331
> crmd:     info: pcmk_quorum_notification:        Obtaining name for
> new node 739512331
> crmd:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512331
> crmd:   notice: get_node_name:   Could not obtain a node name for
> corosync nodeid 739512331
> crmd:   notice: crm_update_peer_state_iter:      Node (null) state is
> now member | nodeid=739512331 previous=unknown
> source=pcmk_quorum_notification
> crmd:   notice: crm_update_peer_state_iter:      Node h12 state is
> now member | nodeid=739512332 previous=unknown
> source=pcmk_quorum_notification
> crmd:     info: peer_update_callback:    Cluster node h12 is now
> member (was in unknown state)
> crmd:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512332
> crmd:   notice: get_node_name:   Defaulting to uname -n for the local
> corosync node name
> ...
> 
> ???
> 
> attrd:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512332
> attrd:   notice: get_node_name:   Defaulting to uname -n for the
> local corosync node name
> attrd:     info: main:    CIB connection active
> ...
> stonith-ng:   notice: get_node_name:   Could not obtain a node name
> for corosync nodeid 739512331
> stonith-ng:     info: crm_get_peer:    Created entry 956e8bf0-5634-
> 4535-aa72-cdd6cf319d5b/0x1d04440 for node (null)/739512331 (2 total)
> stonith-ng:     info: crm_get_peer:    Node 739512331 has uuid
> 739512331
> stonith-ng:     info: pcmk_cpg_membership:     Node 739512331 still
> member of group stonith-ng (peer=(null):7277, counter=0.0, at least
> once)
> stonith-ng:     info: crm_update_peer_proc:    pcmk_cpg_membership:
> Node (null)[739512331] - corosync-cpg is now online
> stonith-ng:   notice: crm_update_peer_state_iter:      Node (null)
> state is now member | nodeid=739512331 previous=unknown
> source=crm_update_peer_proc
> ...
> attrd:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512331
> attrd:   notice: get_node_name:   Could not obtain a node name for
> corosync nodeid 739512331
> attrd:     info: crm_get_peer:    Created entry 40380a43-c1e2-498a-
> bc9e-d68968acf4d6/0x2572850 for node (null)/739512331 (2 total)
> attrd:     info: crm_get_peer:    Node 739512331 has uuid 739512331
> attrd:     info: pcmk_cpg_membership:     Node 739512331 still member
> of group attrd (peer=(null):7279, counter=0.0, at least once)
> attrd:     info: crm_update_peer_proc:    pcmk_cpg_membership: Node
> (null)[739512331] - corosync-cpg is now online
> attrd:   notice: crm_update_peer_state_iter:      Node (null) state
> is now member | nodeid=739512331 previous=unknown
> source=crm_update_peer_proc
> attrd:     info: pcmk_cpg_membership:     Node 739512332 still member
> of group attrd (peer=h12:40553, counter=0.1, at least once)
> attrd:     info: crm_get_peer:    Node 739512331 is now known as h11
> attrd:   notice: attrd_check_for_new_writer:      Recorded new
> attribute writer: h11 (was unset)
> ...
> crmd:     info: pcmk_cpg_membership:     Node 739512332 joined group
> crmd (counter=0.0, pid=0, unchecked for rivals)
> crmd:     info: corosync_node_name:      Unable to get node name for
> nodeid 739512331
> crmd:   notice: get_node_name:   Could not obtain a node name for
> corosync nodeid 739512331
> crmd:     info: pcmk_cpg_membership:     Node 739512331 still member
> of group crmd (peer=(null):7281, counter=0.0, at least once)
> crmd:     info: crm_update_peer_proc:    pcmk_cpg_membership: Node
> (null)[739512331] - corosync-cpg is now online
> 
> ???
> 
> crmd:     info: pcmk_cpg_membership:     Node 739512332 still member
> of group crmd (peer=h12:40555, counter=0.1, at least once)
> crmd:     info: crm_get_peer:    Node 739512331 is now known as h11
> crmd:     info: peer_update_callback:    Cluster node h11 is now
> member
> crmd:     info: update_dc:       Set DC to h11 (3.0.14)
> crmd:     info: crm_update_peer_expected:        update_dc: Node
> h11[739512331] - expected state is now member (was (null))
> ...
> 
> I feel this mess with determining the node name is overly
> complicated...
> 
> Regards,
> Ulrich

Complicated, yes -- overly, depends on your point of view :)

Putting "name:" in corosync.conf simplifies things.
-- 
Ken Gaillot <kgaillot at redhat.com>



More information about the Users mailing list