[ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
Shermal Fernando
shermalfe at millenniumit.com
Thu Sep 8 07:08:19 UTC 2016
If the DC (crm daemon) is frozen (corosync is running without problem), DC will not time out. Frozen DC will be there forever.
Regards,
Shermal Fernando
-----Original Message-----
From: Ulrich Windl [mailto:Ulrich.Windl at rz.uni-regensburg.de]
Sent: Thursday, September 08, 2016 12:18 PM
To: users at clusterlabs.org
Subject: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
>>> Shermal Fernando <shermalfe at millenniumit.com> schrieb am 08.09.2016
>>> um 06:41 in
Nachricht
<8CE6E8D87F896546B9C65ED80D30A4336578CB4A at LG-SPMB-MBX02.lseg.stockex.local>:
> The whole cluster will fail if the DC (crm daemon) is frozen due to
> CPU starvation or hanging while trying to perform a IO operation.
> Please share some thoughts on this issue.
What is "the whole cluster will fail"? If the DC times out, some recovery will take place.
>
> Regards,
> Shermal Fernando
>
>
>
>
>
>
>
> -----Original Message-----
> From: Klaus Wenninger [mailto:kwenning at redhat.com]
> Sent: Monday, September 05, 2016 6:42 PM
> To: users at clusterlabs.org; developers at clusterlabs.org
> Subject: Re: [ClusterLabs] When the DC crmd is frozen, cluster
> decisions are delayed infinitely
>
> On 09/03/2016 08:42 PM, Shermal Fernando wrote:
>>
>> Hi,
>>
>>
>>
>> Currently our system have 99.96% uptime. But our goal is to increase
>> it beyond 99.999%. Now we are studying the
>> reliability/performance/features of pacemaker to replace the existing
>> clustering solution.
>>
>>
>>
>> While testing pacemaker, I have encountered a problem. If the DC (crm
>> daemon) is frozen by sending the SIGSTOP signal, crmds in other
>> machines never start election to elect a new DC. Therefore
>> fail-overs, resource restartings and other cluster decisions will be
>> delayed until the DC is unfrozen.
>>
>> Is this the default behavior of pacemaker or is it due to a
>> misconfiguration? Is there any way to avoid this single point of failure?
>>
>>
>>
>> For the testing, we use Pacemaker 1.1.12 with Corosync 2.3.3 in SLES
>> 12 SP1 operation system.
>>
>
> Guess I can reproduce that with pacemaker 1.1.15 & corosync 2.3.6.
> I'm having sbd with pacemaker-watcher running as well on the nodes.
> As the node-health is not updated and the cib can be read sbd is happy
> - as to be expected.
> Maybe we could at least add something into sbd-pacemaker-watcher to
> detect the issue ... thinking ...
>
> Regards,
> Klaus
>
>>
>>
>>
>>
>> Regards,
>>
>> Shermal Fernando
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> This e-mail transmission (inclusive of any attachments) is strictly
>> confidential and intended solely for the ordinary user of the e-mail
>> address to which it was addressed. It may contain legally privileged
>> and/or CONFIDENTIAL information. The unauthorized use, disclosure,
>> distribution printing and/or copying of this e-mail or any
>> information it contains is prohibited and could, in certain
>> circumstances, constitute an offence. If you have received this
>> e-mail in error or are not an intended recipient please inform the
>> sender of the email and MillenniumIT immediately by return e-mail or
>> telephone (+94-11) 2416000. We advise that in keeping with good
>> computing practice, the recipient of this e-mail should ensure that
>> it is virus free. We do not accept responsibility for any virus that
>> may be transferred by way of this e-mail. E-mail may be susceptible
>> to data corruption, interception and unauthorized amendment, and we
>> do not accept liability for any such corruption, interception or
>> amendment or any consequences thereof.
>>
>> www.millenniumit.com <http://www.millenniumit.com>
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: Users at clusterlabs.org http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
More information about the Users
mailing list