[ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Q: cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying

Mon Mar 6 11:12:18 UTC 2017

that you say to the cluster, to not perform any action, because you
are doing an intervention.

2017-03-06 9:14 GMT+01:00 Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>:
>>>> emmanuel segura <emi2fast at gmail.com> schrieb am 03.03.2017 um 17:43 in
> Nachricht
> <CAE7pJ3Aj8_xYDKYPcmYqomMKm2ku3s04vsskciyJ6+TVniiRrQ at mail.gmail.com>:
>> use something like standby?
>
> Hi!
>
> What is the benefit of using "standby" compared to stopping the whole cluster stack on the node when you intend to update the cluster software, the kernel, and perform a reboot? The only difference I see that after reboot the node wouldn't start to run services, so I'll ahve to "online" the node again.
>
> Regards,
> Ulrich
>
>>
>> 2017-03-03 16:02 GMT+01:00 Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>:
>>>>>> emmanuel segura <emi2fast at gmail.com> schrieb am 03.03.2017 um 15:35 in
>>> Nachricht
>>> <CAE7pJ3BVwnbWoPRQzg8K=NnNxUzxO16dsL2KyYsMuVS3FWWbMg at mail.gmail.com>:
>>>> I think is a good idea to put your cluster in maintenance mode, when
>>>> you do an update.
>>>
>>> You should know that I stopped the cluster services on the node in order to
>> install updates (and reboot). This caused all resources to be moved away from
>> that node. I think it would be counter-productive to boot the node with
>> resources running in maintenance mode. Do you disagree?
>>>
>>>>
>>>> 2017-03-03 15:11 GMT+01:00 Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>:
>>>>>>>> emmanuel segura <emi2fast at gmail.com> schrieb am 03.03.2017 um 14:22 in
>>>>> Nachricht
>>>>> <CAE7pJ3A=oTkWwaz9t0JFTAL1t5G7hmhxpv-ywTUG4JFD9MymAw at mail.gmail.com>:
>>>>>> your cluster was in maintenance state?
>>>>>
>>>>> No, it wasn't? Should it?
>>>>>
>>>>>>
>>>>>> 2017-03-03 13:59 GMT+01:00 Ulrich Windl <Ulrich.Windl at rz.uni-regensburg.de>:
>>>>>>> Hello!
>>>>>>>
>>>>>>> After Update and reboot of 2nd of three nodes (SLES11 SP4) I see a
>>>>>> "cluster-dlm[4494]: setup_cpg_daemon: daemon cpg_join error retrying"
>>>>>> message when I expected the node to joint the cluster. What can be the
>>>>>> reasons for this?
>>>>>>> In fact this seems to have killed cluster communication, because I saw that
>>>>>> "DLM start" timed out. The other nodes were unable to use DLM during that
>>>>>> time (while the node could not join).
>>>>>>>
>>>>>>> I saw that corosync starts before the firewall in SLES11 SP4; maybe that's a
>>>>>> problem.
>>>>>>>
>>>>>>> I tried an "rcopenais stop" of the problem node, which in tun caused a node
>>>>>> fence (DLM stop timed out, too), and then the other nodes were able to
>>>>>> communicate again. During boot the problem node was able to join the cluster
>>>>>> as before. In the meantime I had also updated the third node without a
>>>>>> problem, so it looks like a rare race condition to me.
>>>>>>> ANy insights?
>>>>>>>
>>>>>>> Could the problem be related to one of these messages?
>>>>>>> crmd[3656]:   notice: get_node_name: Could not obtain a node name for
>>>>>> classic openais (with plugin) nodeid 739512321
>>>>>>> corosync[3646]:  [pcmk  ] info: update_member: 0x64bc90 Node 739512325
>>>>>> ((null)) born on: 3352
>>>>>>> stonith-ng[3652]:   notice: get_node_name: Could not obtain a node name for
>>>>>> classic openais (with plugin) nodeid 739512321
>>>>>>> crmd[3656]:   notice: get_node_name: Could not obtain a node name for
>>>>>> classic openais (with plugin) nodeid 739512330
>>>>>>> cib[3651]:   notice: get_node_name: Could not obtain a node name for classic
>>>>>> openais (with plugin) nodeid 739512321
>>>>>>> cib[3651]:   notice: crm_update_peer_state: plugin_handle_membership: Node
>>>>>> (null)[739512321] - state is now member (was (null))
>>>>>>>
>>>>>>> crmd:     info: crm_get_peer:     Created entry
>>>>>> 8a7d6859-5ab1-404b-95a0-ba28064763fb/0x7a81f0 for node (null)/739512321
>>>>>> (2 total)
>>>>>>> crmd:     info: crm_get_peer:     Cannot obtain a UUID for node
>>>>>> 739512321/(null)
>>>>>>> crmd:     info: crm_update_peer:  plugin_handle_membership: Node (null):
>>>>>> id=739512321 state=member addr=r(0) ip(172.20.16.1) r(1) ip(10.2.2.1)  (new)
>>>>>> votes=0 born=0 seen=3352 proc=00000000000000000000000000000000
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ulrich
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>   .~.
>>>>>>   /V\
>>>>>>  //  \\
>>>>>> /(   )\
>>>>>> ^`~'^
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list: Users at clusterlabs.org
>>>>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list: Users at clusterlabs.org
>>>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>>>
>>>> --
>>>>   .~.
>>>>   /V\
>>>>  //  \\
>>>> /(   )\
>>>> ^`~'^
>>>>
>>>> _______________________________________________
>>>> Users mailing list: Users at clusterlabs.org
>>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users at clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> --
>>   .~.
>>   /V\
>>  //  \\
>> /(   )\
>> ^`~'^
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^