[ClusterLabs] Antw: Re: Q: cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying

emmanuel segura emi2fast at gmail.com
Fri Mar 3 14:35:09 UTC 2017


I think is a good idea to put your cluster in maintenance mode, when
you do an update.

2017-03-03 15:11 GMT+01:00 Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>:
>>>> emmanuel segura <emi2fast at gmail.com> schrieb am 03.03.2017 um 14:22 in
> Nachricht
> <CAE7pJ3A=oTkWwaz9t0JFTAL1t5G7hmhxpv-ywTUG4JFD9MymAw at mail.gmail.com>:
>> your cluster was in maintenance state?
>
> No, it wasn't? Should it?
>
>>
>> 2017-03-03 13:59 GMT+01:00 Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>:
>>> Hello!
>>>
>>> After Update and reboot of 2nd of three nodes (SLES11 SP4) I see a
>> "cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying"
>> message when I expected the node to joint the cluster. What can be the
>> reasons for this?
>>> In fact this seems to have killed cluster communication, because I saw that
>> "DLM start" timed out. The other nodes were unable to use DLM during that
>> time (while the node could not join).
>>>
>>> I saw that corosync starts before the firewall in SLES11 SP4; maybe that's a
>> problem.
>>>
>>> I tried an "rcopenais stop" of the problem node, which in tun caused a node
>> fence (DLM stop timed out, too), and then the other nodes were able to
>> communicate again. During boot the problem node was able to join the cluster
>> as before. In the meantime I had also updated the third node without a
>> problem, so it looks like a rare race condition to me.
>>> ANy insights?
>>>
>>> Could the problem be related to one of these messages?
>>> crmd[3656]:   notice: get_node_name: Could not obtain a node name for
>> classic openais (with plugin) nodeid 739512321
>>> corosync[3646]:  [pcmk  ] info: update_member: 0x64bc90 Node 739512325
>> ((null)) born on: 3352
>>> stonith-ng[3652]:   notice: get_node_name: Could not obtain a node name for
>> classic openais (with plugin) nodeid 739512321
>>> crmd[3656]:   notice: get_node_name: Could not obtain a node name for
>> classic openais (with plugin) nodeid 739512330
>>> cib[3651]:   notice: get_node_name: Could not obtain a node name for classic
>> openais (with plugin) nodeid 739512321
>>> cib[3651]:   notice: crm_update_peer_state: plugin_handle_membership: Node
>> (null)[739512321] - state is now member (was (null))
>>>
>>> crmd:     info: crm_get_peer:     Created entry
>> 8a7d6859-5ab1-404b-95a0-ba28064763fb/0x7a81f0 for node (null)/739512321
>> (2 total)
>>> crmd:     info: crm_get_peer:     Cannot obtain a UUID for node
>> 739512321/(null)
>>> crmd:     info: crm_update_peer:  plugin_handle_membership: Node (null):
>> id=739512321 state=member addr=r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)  (new)
>> votes=0 born=0 seen=3352 proc=00000000000000000000000000000000
>>>
>>>
>>> Regards,
>>> Ulrich
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> --
>>   .~.
>>   /V\
>>  //  \\
>> /(   )\
>> ^`~'^
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^




More information about the Users mailing list