[ClusterLabs] Antw: Re: Antw: Re: Q: cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Fri Mar 3 10:02:42 EST 2017


>>> emmanuel segura <emi2fast at gmail.com> schrieb am 03.03.2017 um 15:35 in
Nachricht
<CAE7pJ3BVwnbWoPRQzg8K=NnNxUzxO16dsL2KyYsMuVS3FWWbMg at mail.gmail.com>:
> I think is a good idea to put your cluster in maintenance mode, when
> you do an update.

You should know that I stopped the cluster services on the node in order to install updates (and reboot). This caused all resources to be moved away from that node. I think it would be counter-productive to boot the node with resources running in maintenance mode. Do you disagree?

> 
> 2017-03-03 15:11 GMT+01:00 Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>:
>>>>> emmanuel segura <emi2fast at gmail.com> schrieb am 03.03.2017 um 14:22 in
>> Nachricht
>> <CAE7pJ3A=oTkWwaz9t0JFTAL1t5G7hmhxpv-ywTUG4JFD9MymAw at mail.gmail.com>:
>>> your cluster was in maintenance state?
>>
>> No, it wasn't? Should it?
>>
>>>
>>> 2017-03-03 13:59 GMT+01:00 Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>:
>>>> Hello!
>>>>
>>>> After Update and reboot of 2nd of three nodes (SLES11 SP4) I see a
>>> "cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying"
>>> message when I expected the node to joint the cluster. What can be the
>>> reasons for this?
>>>> In fact this seems to have killed cluster communication, because I saw that
>>> "DLM start" timed out. The other nodes were unable to use DLM during that
>>> time (while the node could not join).
>>>>
>>>> I saw that corosync starts before the firewall in SLES11 SP4; maybe that's a
>>> problem.
>>>>
>>>> I tried an "rcopenais stop" of the problem node, which in tun caused a node
>>> fence (DLM stop timed out, too), and then the other nodes were able to
>>> communicate again. During boot the problem node was able to join the cluster
>>> as before. In the meantime I had also updated the third node without a
>>> problem, so it looks like a rare race condition to me.
>>>> ANy insights?
>>>>
>>>> Could the problem be related to one of these messages?
>>>> crmd[3656]:   notice: get_node_name: Could not obtain a node name for
>>> classic openais (with plugin) nodeid 739512321
>>>> corosync[3646]:  [pcmk  ] info: update_member: 0x64bc90 Node 739512325
>>> ((null)) born on: 3352
>>>> stonith-ng[3652]:   notice: get_node_name: Could not obtain a node name for
>>> classic openais (with plugin) nodeid 739512321
>>>> crmd[3656]:   notice: get_node_name: Could not obtain a node name for
>>> classic openais (with plugin) nodeid 739512330
>>>> cib[3651]:   notice: get_node_name: Could not obtain a node name for classic
>>> openais (with plugin) nodeid 739512321
>>>> cib[3651]:   notice: crm_update_peer_state: plugin_handle_membership: Node
>>> (null)[739512321] - state is now member (was (null))
>>>>
>>>> crmd:     info: crm_get_peer:     Created entry
>>> 8a7d6859-5ab1-404b-95a0-ba28064763fb/0x7a81f0 for node (null)/739512321
>>> (2 total)
>>>> crmd:     info: crm_get_peer:     Cannot obtain a UUID for node
>>> 739512321/(null)
>>>> crmd:     info: crm_update_peer:  plugin_handle_membership: Node (null):
>>> id=739512321 state=member addr=r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)  (new)
>>> votes=0 born=0 seen=3352 proc=00000000000000000000000000000000
>>>>
>>>>
>>>> Regards,
>>>> Ulrich
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org 
>>>> http://lists.clusterlabs.org/mailman/listinfo/users 
>>>>
>>>> Project Home: http://www.clusterlabs.org 
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>> Bugs: http://bugs.clusterlabs.org 
>>>
>>>
>>>
>>> --
>>>   .~.
>>>   /V\
>>>  //  \\
>>> /(   )\
>>> ^`~'^
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org 
>>> http://lists.clusterlabs.org/mailman/listinfo/users 
>>>
>>> Project Home: http://www.clusterlabs.org 
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> 
> 
> -- 
>   .~.
>   /V\
>  //  \\
> /(   )\
> ^`~'^
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 







More information about the Users mailing list