[ClusterLabs] Questions about SBD behavior
kwenning at redhat.com
Wed Jun 13 05:40:10 EDT 2018
On 06/13/2018 10:58 AM, 井上 和徳 wrote:
> Thanks for the response.
> As of v1.3.1 and later, I recognized that real quorum is necessary.
> I also read this:
> As related to this specification, in order to use pacemaker-2.0,
> we are confirming the following known issue.
> * When SIGSTOP is sent to the pacemaker process, no failure of the
> resource will be detected.
> I expected that it was being handled by SBD, but no one detected
> that the following process was frozen. Therefore, no failure of
> the resource was detected either.
> - pacemaker-based
> - pacemaker-execd
> - pacemaker-attrd
> - pacemaker-schedulerd
> - pacemaker-controld
> I confirmed this, but I couldn't read about the correspondence
You are right. The issue was known as when I created these slides.
So a plan for improving the observation of the pacemaker-daemons
should have gone into that probably.
Thanks for bringing this to the table.
Guess the issue got a little bit neglected recently.
> As a result of our discussion, we want SBD to detect it and reset the
Implementation wise I would go for some kind of a split
solution between pacemaker & SBD. Thinking of Pacemaker
observing the sub-daemons by itself while there would be
some kind of a heartbeat (implicitly via corosync or explicitly)
between pacemaker & SBD that assures this internal
observation is doing it's job properly.
> Also, for users who do not have shared disk or qdevice,
> we need an option to work even without real quorum.
> (fence races are going to avoid with delay attribute:
I'm not sure if I get your point here.
Watchdog-fencing on a 2-node-cluster without
additional qdevice or shared disk is like denying
the laws of physics in my mind.
At the moment I don't see why auto_tie_breaker
wouldn't work on a 4-node and up cluster here.
> Best Regards,
> Kazunori INOUE
>> -----Original Message-----
>> From: Users [mailto:users-bounces at clusterlabs.org] On Behalf Of Klaus Wenninger
>> Sent: Friday, May 25, 2018 4:08 PM
>> To: users at clusterlabs.org
>> Subject: Re: [ClusterLabs] Questions about SBD behavior
>> On 05/25/2018 07:31 AM, 井上 和徳 wrote:
>>> I am checking the watchdog function of SBD (without shared block-device).
>>> In a two-node cluster, if one cluster is stopped, watchdog is triggered on the
>> remaining node.
>>> Is this the designed behavior?
>> SBD without a shared block-device doesn't really make sense on
>> a two-node cluster.
>> The basic idea is - e.g. in a case of a networking problem -
>> that a cluster splits up in a quorate and a non-quorate partition.
>> The quorate partition stays over while SBD guarantees a
>> reliable watchdog-based self-fencing of the non-quorate partition
>> within a defined timeout.
>> This idea of course doesn't work with just 2 nodes.
>> Taking quorum info from the 2-node feature of corosync (automatically
>> switching on wait-for-all) doesn't help in this case but instead
>> would lead to split-brain.
>> What you can do - and what e.g. pcs does automatically - is enable
>> the auto-tie-breaker instead of two-node in corosync. But that
>> still doesn't give you a higher availability than the one of the
>> winner of auto-tie-breaker. (Maybe interesting if you are going
>> for a load-balancing-scenario that doesn't affect availability or
>> for a transient state while setting up a cluste node-by-node ...)
>> What you can do though is using qdevice to still have 'real-quorum'
>> info with just 2 full cluster-nodes.
>> There was quite a lot of discussion round this topic on this
>> thread previously if you search the history.
> Users mailing list: Users at clusterlabs.org
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
More information about the Users