[ClusterLabs Developers] [ClusterLabs] When the DC crmd is frozen, cluster decisions are delayed infinitely

Klaus Wenninger kwenning at redhat.com
Mon Sep 5 09:12:13 EDT 2016


On 09/03/2016 08:42 PM, Shermal Fernando wrote:
>
> Hi,
>
>  
>
> Currently our system have 99.96% uptime. But our goal is to increase
> it beyond 99.999%. Now we are studying the
> reliability/performance/features of pacemaker to replace the existing
> clustering solution.
>
>  
>
> While testing pacemaker, I have encountered a problem. If the DC (crm
> daemon) is frozen by sending the SIGSTOP signal, crmds in other
> machines never start election to elect a new DC. Therefore fail-overs,
> resource restartings and other cluster decisions will be delayed until
> the DC is unfrozen.
>
> Is this the default behavior of pacemaker or is it due to a
> misconfiguration? Is there any way to avoid this single point of failure?
>
>  
>
> For the testing, we use Pacemaker 1.1.12 with Corosync 2.3.3 in SLES
> 12 SP1 operation system.
>

Guess I can reproduce that with pacemaker 1.1.15 & corosync 2.3.6.
I'm having sbd with pacemaker-watcher running as well on the nodes.
As the node-health is not updated and the cib can be read sbd is
happy - as to be expected.
Maybe we could at least add something into sbd-pacemaker-watcher
to detect the issue ... thinking ...

Regards,
Klaus

>  
>
>  
>
> Regards,
>
> Shermal Fernando
>
>  
>
>  
>
>  
>
>  
>
>  
>
>  
>
>  
>
> This e-mail transmission (inclusive of any attachments) is strictly
> confidential and intended solely for the ordinary user of the e-mail
> address to which it was addressed. It may contain legally privileged
> and/or CONFIDENTIAL information. The unauthorized use, disclosure,
> distribution printing and/or copying of this e-mail or any information
> it contains is prohibited and could, in certain circumstances,
> constitute an offence. If you have received this e-mail in error or
> are not an intended recipient please inform the sender of the email
> and MillenniumIT immediately by return e-mail or telephone (+94-11)
> 2416000. We advise that in keeping with good computing practice, the
> recipient of this e-mail should ensure that it is virus free. We do
> not accept responsibility for any virus that may be transferred by way
> of this e-mail. E-mail may be susceptible to data corruption,
> interception and unauthorized amendment, and we do not accept
> liability for any such corruption, interception or amendment or any
> consequences thereof.
>
> www.millenniumit.com <http://www.millenniumit.com>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





More information about the Developers mailing list