[ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
shermalfe at millenniumit.com
Thu Sep 8 05:51:27 EDT 2016
Sorry for disturbing you. This is really important for us to pass this test on the pacemaker resiliency and robustness.
To my understanding, it's the pacemakerd who feeds the watchdog. If only the crmd is hung, fencing will not work. Am I correct here?
From: Jehan-Guillaume de Rorthais [mailto:jgdr at dalibo.com]
Sent: Thursday, September 08, 2016 3:12 PM
To: Shermal Fernando
Cc: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
On Thu, 8 Sep 2016 08:58:15 +0000
Shermal Fernando <shermalfe at millenniumit.com> wrote:
> Hi Jehan-Guillaume,
> Does this means watchdog will serf-terminate the machine when the crm
> daemon is frozen?
This means that if the machine is under such a load that PAcemaker is not able to feed the watchdog, the watchdog will fence the machine itself.
> -----Original Message-----
> From: Jehan-Guillaume de Rorthais [mailto:jgdr at dalibo.com]
> Sent: Thursday, September 08, 2016 12:52 PM
> To: Digimer
> Cc: Cluster Labs - All topics related to open-source clustering
> Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen,
> cluster decisions are delayed infinitely
> On Thu, 8 Sep 2016 15:55:50 +0900
> Digimer <lists at alteeve.ca> wrote:
> > On 08/09/16 03:47 PM, Ulrich Windl wrote:
> > >>>> Shermal Fernando <shermalfe at millenniumit.com> schrieb am
> > >>>> 08.09.2016 um
> > >>>> 06:41 in
> > > Nachricht
> > > <8CE6E8D87F896546B9C65ED80D30A4336578CB4A at LG-SPMB-MBX02.lseg.stockex.local>:
> > >> The whole cluster will fail if the DC (crm daemon) is frozen due
> > >> to CPU starvation or hanging while trying to perform a IO operation.
> > >> Please share some thoughts on this issue.
> > >
> > > What is "the whole cluster will fail"? If the DC times out, some
> > > recovery will take place.
> > Yup. The starved node should be declared lost by corosync, the
> > remaining nodes reform and if they're still quorate, the hung node
> > should be fenced. Recovery occur and life goes on.
> And fencing might either come from outside, or just from the server
> itself using watchdog.
This e-mail transmission (inclusive of any attachments) is strictly confidential and intended solely for the ordinary user of the e-mail address to which it was addressed. It may contain legally privileged and/or CONFIDENTIAL information. The unauthorized use, disclosure, distribution printing and/or copying of this e-mail or any information it contains is prohibited and could, in certain circumstances, constitute an offence. If you have received this e-mail in error or are not an intended recipient please inform the sender of the email and MillenniumIT immediately by return e-mail or telephone (+94-11) 2416000. We advise that in keeping with good computing practice, the recipient of this e-mail should ensure that it is virus free. We do not accept responsibility for any virus that may be transferred by way of this e-mail. E-mail may be susceptible to data corruption, interception and unauthorized amendment, and we do not accept liability for any such corruption, interception or amendment or any consequences thereof. www.millenniumit.com
More information about the Users