[ClusterLabs] Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Sep 8 12:34:00 UTC 2016


>>> Klaus Wenninger <kwenning at redhat.com> schrieb am 08.09.2016 um 11:45 in
Nachricht <e9ab025b-7a96-06dd-9e24-d805e2480f8d at redhat.com>:
> On 09/08/2016 10:58 AM, Shermal Fernando wrote:
>> Hi Jehan-Guillaume,
>>
>> Does this means watchdog will serf-terminate the machine when the crm daemon 
> is frozen?
> 
> Would be desirable but doesn't seem to happen - at least till now - will
> see what I can do on that front.

I think in HP-UX Service Guard the corresponding process of crmd was feeding (periodically resetting) the kernel watchdog (like every 10 seconds). When the process was killed or stopped in a non expected way (the process would disable the watcdog on clean exit), the kernel watchdog triggered a kernel panic, effectively fencing the node.

>  
>>
>> Regards,
>> Shermal Fernando
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Jehan-Guillaume de Rorthais [mailto:jgdr at dalibo.com] 
>> Sent: Thursday, September 08, 2016 12:52 PM
>> To: Digimer
>> Cc: Cluster Labs - All topics related to open-source clustering welcomed
>> Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster 
> decisions are delayed infinitely
>>
>> On Thu, 8 Sep 2016 15:55:50 +0900
>> Digimer <lists at alteeve.ca> wrote:
>>
>>> On 08/09/16 03:47 PM, Ulrich Windl wrote:
>>>>>>> Shermal Fernando <shermalfe at millenniumit.com> schrieb am 
>>>>>>> 08.09.2016 um
>>>>>>> 06:41 in
>>>> Nachricht
>>>> <8CE6E8D87F896546B9C65ED80D30A4336578CB4A at LG-SPMB-MBX02.lseg.stockex.local>:
>>>>> The whole cluster will fail if the DC (crm daemon) is frozen due to 
>>>>> CPU starvation or hanging while trying to perform a IO operation.
>>>>> Please share some thoughts on this issue.
>>>> What is "the whole cluster will fail"? If the DC times out, some 
>>>> recovery will take place.
>>> Yup. The starved node should be declared lost by corosync, the 
>>> remaining nodes reform and if they're still quorate, the hung node 
>>> should be fenced. Recovery occur and life goes on.
>> +1
>>
>> And fencing might either come from outside, or just from the server itself 
> using watchdog.
>>
>> --
>> Jehan-Guillaume (ioguix) de Rorthais
>> Dalibo
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>>
>>
>> This e-mail transmission (inclusive of any attachments) is strictly 
> confidential and intended solely for the ordinary user of the e-mail address 
> to which it was addressed. It may contain legally privileged and/or 
> CONFIDENTIAL information. The unauthorized use, disclosure, distribution 
> printing and/or copying of this e-mail or any information it contains is 
> prohibited and could, in certain circumstances, constitute an offence. If you 
> have received this e-mail in error or are not an intended recipient please 
> inform the sender of the email and MillenniumIT immediately by return e-mail 
> or telephone (+94-11) 2416000. We advise that in keeping with good computing 
> practice, the recipient of this e-mail should ensure that it is virus free. We 
> do not accept responsibility for any virus that may be transferred by way of 
> this e-mail. E-mail may be susceptible to data corruption, interception and 
> unauthorized amendment, and we do not accept liability for any such 
> corruption, interceptio!
>>  n or amen
>>  dment or any consequences thereof.  www.millenniumit.com 
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 







More information about the Users mailing list