[ClusterLabs] Q: cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Fri Mar 3 07:59:13 EST 2017


Hello!

After Update and reboot of 2nd of three nodes (SLES11 SP4) I see a "cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying" message when I expected the node to joint the cluster. What can be the reasons for this?
In fact this seems to have killed cluster communication, because I saw that "DLM start" timed out. The other nodes were unable to use DLM during that time (while the node could not join).

I saw that corosync starts before the firewall in SLES11 SP4; maybe that's a problem.

I tried an "rcopenais stop" of the problem node, which in tun caused a node fence (DLM stop timed out, too), and then the other nodes were able to communicate again. During boot the problem node was able to join the cluster as before. In the meantime I had also updated the third node without a problem, so it looks like a rare race condition to me.
ANy insights?

Could the problem be related to one of these messages?
crmd[3656]:   notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 739512321
corosync[3646]:  [pcmk  ] info: update_member: 0x64bc90 Node 739512325 ((null)) born on: 3352
stonith-ng[3652]:   notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 739512321
crmd[3656]:   notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 739512330
cib[3651]:   notice: get_node_name: Could not obtain a node name for classic openais (with plugin) nodeid 739512321
cib[3651]:   notice: crm_update_peer_state: plugin_handle_membership: Node (null)[739512321] - state is now member (was (null))

crmd:     info: crm_get_peer:     Created entry 8a7d6859-5ab1-404b-95a0-ba28064763fb/0x7a81f0 for node (null)/739512321 (2 total)
crmd:     info: crm_get_peer:     Cannot obtain a UUID for node 739512321/(null)
crmd:     info: crm_update_peer:  plugin_handle_membership: Node (null): id=739512321 state=member addr=r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)  (new) votes=0 born=0 seen=3352 proc=00000000000000000000000000000000


Regards,
Ulrich








More information about the Users mailing list