[ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Sep 8 06:47:52 UTC 2016


>>> Shermal Fernando <shermalfe at millenniumit.com> schrieb am 08.09.2016 um 06:41 in
Nachricht
<8CE6E8D87F896546B9C65ED80D30A4336578CB4A at LG-SPMB-MBX02.lseg.stockex.local>:
> The whole cluster will fail if the DC (crm daemon) is frozen due to CPU 
> starvation or hanging while trying to perform a IO operation.  
> Please share some thoughts on this issue.

What is "the whole cluster will fail"? If the DC times out, some recovery will take place.

> 
> Regards,
> Shermal Fernando
> 
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Klaus Wenninger [mailto:kwenning at redhat.com] 
> Sent: Monday, September 05, 2016 6:42 PM
> To: users at clusterlabs.org; developers at clusterlabs.org 
> Subject: Re: [ClusterLabs] When the DC crmd is frozen, cluster decisions are 
> delayed infinitely
> 
> On 09/03/2016 08:42 PM, Shermal Fernando wrote:
>>
>> Hi,
>>
>>  
>>
>> Currently our system have 99.96% uptime. But our goal is to increase 
>> it beyond 99.999%. Now we are studying the 
>> reliability/performance/features of pacemaker to replace the existing 
>> clustering solution.
>>
>>  
>>
>> While testing pacemaker, I have encountered a problem. If the DC (crm
>> daemon) is frozen by sending the SIGSTOP signal, crmds in other 
>> machines never start election to elect a new DC. Therefore fail-overs, 
>> resource restartings and other cluster decisions will be delayed until 
>> the DC is unfrozen.
>>
>> Is this the default behavior of pacemaker or is it due to a 
>> misconfiguration? Is there any way to avoid this single point of failure?
>>
>>  
>>
>> For the testing, we use Pacemaker 1.1.12 with Corosync 2.3.3 in SLES
>> 12 SP1 operation system.
>>
> 
> Guess I can reproduce that with pacemaker 1.1.15 & corosync 2.3.6.
> I'm having sbd with pacemaker-watcher running as well on the nodes.
> As the node-health is not updated and the cib can be read sbd is happy - as to 
> be expected.
> Maybe we could at least add something into sbd-pacemaker-watcher to detect the 
> issue ... thinking ...
> 
> Regards,
> Klaus
> 
>>  
>>
>>  
>>
>> Regards,
>>
>> Shermal Fernando
>>
>>  
>>
>>  
>>
>>  
>>
>>  
>>
>>  
>>
>>  
>>
>>  
>>
>> This e-mail transmission (inclusive of any attachments) is strictly 
>> confidential and intended solely for the ordinary user of the e-mail 
>> address to which it was addressed. It may contain legally privileged 
>> and/or CONFIDENTIAL information. The unauthorized use, disclosure, 
>> distribution printing and/or copying of this e-mail or any information 
>> it contains is prohibited and could, in certain circumstances, 
>> constitute an offence. If you have received this e-mail in error or 
>> are not an intended recipient please inform the sender of the email 
>> and MillenniumIT immediately by return e-mail or telephone (+94-11) 
>> 2416000. We advise that in keeping with good computing practice, the 
>> recipient of this e-mail should ensure that it is virus free. We do 
>> not accept responsibility for any virus that may be transferred by way 
>> of this e-mail. E-mail may be susceptible to data corruption, 
>> interception and unauthorized amendment, and we do not accept 
>> liability for any such corruption, interception or amendment or any 
>> consequences thereof.
>>
>> www.millenniumit.com <http://www.millenniumit.com>
>>
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 







More information about the Users mailing list