[ClusterLabs] Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
Digimer
lists at alteeve.ca
Thu Sep 8 13:36:02 UTC 2016
On 08/09/16 09:32 PM, Klaus Wenninger wrote:
> On 09/08/2016 02:28 PM, Ulrich Windl wrote:
>>>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 08.09.2016 um 09:13 in
>> Nachricht <4c828344-44da-1d93-b43f-a305cfaa5402 at redhat.com>:
>>> On 09/08/2016 08:55 AM, Digimer wrote:
>>>> On 08/09/16 03:47 PM, Ulrich Windl wrote:
>>>>>>>> Shermal Fernando <shermalfe at millenniumit.com> schrieb am 08.09.2016 um 06:41
>>> in
>>>>> Nachricht
>>>>> <8CE6E8D87F896546B9C65ED80D30A4336578CB4A at LG-SPMB-MBX02.lseg.stockex.local>:
>>>>>> The whole cluster will fail if the DC (crm daemon) is frozen due to CPU
>>>>>> starvation or hanging while trying to perform a IO operation.
>>>>>> Please share some thoughts on this issue.
>>>>> What is "the whole cluster will fail"? If the DC times out, some recovery
>>> will take place.
>>>> Yup. The starved node should be declared lost by corosync, the remaining
>>>> nodes reform and if they're still quorate, the hung node should be
>>>> fenced. Recovery occur and life goes on.
>>> Didn't happen in my test (SIGSTOP to crmd).
>>> Might be a configuration mistake though...
>>> Even had sbd with a watchdog active (amongst
>>> other - real - fencing devices).
>>> Thinking if it might make sense so tickle the
>>> crmd-API from sbd-pacemaker-watcher ...
>> OK, so we mix "DC" and crmd. crmd is just a part of the DC. I guess if corosync is up and happy, but crmd is silent, the cluster just thinks that the DC has nothing to say.
>> But I still wonder what will happen if crmd is goinf to send some reply to a command.
>
> Just lost accuracy during discussion. We did stop crmd on the DC.
Corosync (via totem protocol's token timeouts) declares node death.
Pacemaker reacts to the change in membership by checking if the
remaining nodes/new cluster is quorate and, if so, initiates fencing. If
corosync doesn't lose the peer, the cluster won't reform and fencing (at
the membership layer) won't be triggered.
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Users
mailing list