[ClusterLabs] Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Thu Sep 8 12:28:13 UTC 2016
>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 08.09.2016 um 09:13 in
Nachricht <4c828344-44da-1d93-b43f-a305cfaa5402 at redhat.com>:
> On 09/08/2016 08:55 AM, Digimer wrote:
>> On 08/09/16 03:47 PM, Ulrich Windl wrote:
>>>>>> Shermal Fernando <shermalfe at millenniumit.com> schrieb am 08.09.2016 um 06:41
> in
>>> Nachricht
>>> <8CE6E8D87F896546B9C65ED80D30A4336578CB4A at LG-SPMB-MBX02.lseg.stockex.local>:
>>>> The whole cluster will fail if the DC (crm daemon) is frozen due to CPU
>>>> starvation or hanging while trying to perform a IO operation.
>>>> Please share some thoughts on this issue.
>>> What is "the whole cluster will fail"? If the DC times out, some recovery
> will take place.
>> Yup. The starved node should be declared lost by corosync, the remaining
>> nodes reform and if they're still quorate, the hung node should be
>> fenced. Recovery occur and life goes on.
> Didn't happen in my test (SIGSTOP to crmd).
> Might be a configuration mistake though...
> Even had sbd with a watchdog active (amongst
> other - real - fencing devices).
> Thinking if it might make sense so tickle the
> crmd-API from sbd-pacemaker-watcher ...
OK, so we mix "DC" and crmd. crmd is just a part of the DC. I guess if corosync is up and happy, but crmd is silent, the cluster just thinks that the DC has nothing to say.
But I still wonder what will happen if crmd is goinf to send some reply to a command.
>>
>> Unless you don't have fencing, then may $deity of mercy. ;)
>>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list