[ClusterLabs] Antw: Re: node name issues (Could not obtain a node name for corosync nodeid 739512332)

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Aug 26 02:59:32 EDT 2019


>>> Ken Gaillot <kgaillot at redhat.com> schrieb am 22.08.2019 um 17:38 in
Nachricht
<9bd1f0de082ff66a2a9b14d1d80cc95d7eff4bac.camel at redhat.com>:
> On Thu, 2019‑08‑22 at 09:07 +0200, Ulrich Windl wrote:
>> Hi!
>> 
>> When starting pacemaker (1.1.19+20181105.ccd6b5b10‑3.10.1) on a node
>> that had been down for a while, I noticed some unexpected messages
>> about the node name:
>> 
>> pacemakerd:   notice: get_node_name:   Could not obtain a node name
>> for corosync nodeid 739512332
>> pacemakerd:     info: crm_get_peer:    Created entry a21bf687‑045b‑
>> 4fd7‑9340‑0562ef595883/0x18752f0 for node (null)/739512332 (1 total)
>> pacemakerd:     info: crm_get_peer:    Node 739512332 has uuid
>> 739512332
>> 
>> Seems UUID and node ID is mixed up in the message at least...
> 
> "UUID" is a misnomer, for historical reasons. It was an actual UUID for
> heartbeat (originally the only supported cluster layer), but for
> corosync it's the node ID and for Pacemaker Remote nodes it's the node
> name.
> 
> Ironically the string after "Created entry" is an actual UUID but
> that's not the "node UUID", just an internal hash table id.
> 
> We should definitely update all those messages to reflect the current
> reality.

;-) +1 ("This cat's a dog for historical reasons")

[...]

>> cib:   notice: crm_update_peer_state_iter:      Node (null) state is
>> now member | nodeid=739512332 previous=unknown
>> source=crm_update_peer_proc
>> ...
>> 
>> This doesn't look right in my eyes.
> 
> Corosync by default provides only a corosync node ID when identifying
> nodes. The daemons have to learn the node names from cluster messages
> passed around by pacemaker. The exception is if "name:" is specified in
> corosync.conf, the daemons can learn the names at start‑up.

Can't there be a CIB event like "node ID is available now", and all the
clients needing a node ID silently wait until it is available instead of
creating all that noise?

> 
> As for the "now online"/"now member", there are two stages of corosync
> membership: cluster membership (i.e. participating in the corosync
> token ring) and process group (CPG) membership (which is corosync's
> node‑to‑node messaging protocol). They generally happen very close to
> each other.


I thought CPG is somewhat a subset of the cluster.

[...]
>> I feel this mess with determining the node name is overly
>> complicated...
>> 
>> Regards,
>> Ulrich
> 
> Complicated, yes ‑‑ overly, depends on your point of view :)
> 
> Putting "name:" in corosync.conf simplifies things.

Also see my earlier message. If adding the node name to corosync conf is
highly recommended, I wonder why SUSE's SLES procedure does not set it...

Thanks for your insights!

Regards,
Ulrich



More information about the Users mailing list