[ClusterLabs] Antw: [EXT] Re: Coming in Pacemaker 2.0.5: better start-up/shutdown coordination with sbd

Klaus Wenninger kwenning at redhat.com
Mon Aug 24 03:52:19 EDT 2020


On 8/24/20 8:04 AM, Ulrich Windl wrote:
>>>> Vladislav Bogdanov <bubble at hoster-ok.com> schrieb am 21.08.2020 um 20:55
> in
> Nachricht <f16f1c7717b388fcd1445b74340c1372719c2d7e.camel at hoster-ok.com>:
>> Hi,
>>
>> btw, is sbd is now able to handle cib diffs internally?
>> Last time I tried to use it with frequently changing CIB, it became a
>> CPU hog - it requested full CIB copy on every change.
> Hi!
>
> I also wonder whether sbd is a tool to fence hosts, or a node-quorum maker
> that controls the cluster.
> I think the cluster should control sbd, not the other way 'round.
None of those I suppose - at least not solely ;-)

sbd definitely has no influence on quorum - so no quorum-maker.
In its purest way of operation with 3 shared disks and no
cluster-awarenesssbd probably is close to what you envision.
Unfortunately 3 disks are quite some extra effort and you may want
to go with a single disk that doesn't become a spof. For certain
scenariospure watchdog-fencing may become interesting as well.

For all of these cases sbd needs to observe the local node health
and ifit is part of a quorate cluster-partition (sees the peer
in the 2-node case).
And as we are not living in a perfect world we can of course not
relyon this supervision never to be stuck or something.
Fortunately sbd is simpleenough that the main supervision is
done in a simple loop that can beeasily supervised by a
hardware watchdog.

Of course we could have done the hardware watchdog supervision
in pacemaker. But sbd & corosync need such a thing as well so
that wouldhave meant to pull that all into pacemaker - loosing
that simplicitymentioned. (For completeness: There meanwhile
is a heartbeat between corosync and sbd as well to compensate
for the hardware-watchdog-interface corosync would offer
alternatively but which would hog the hardware watchdog.)

We could have used one of the mechanisms that
provide multiple watchdogs supervised by a hardware watchdog
(e.g. systemd - that per design makes it hard to name a time
when you actually can assume a node to be rebooted when
misbehaving) but there is actually nothing you find on every
Linux platform.

So you see that the architecture is kind of natural and makes
sense. Introduction of the pacemakerd-API and it being used
in sbd definitely goes in the direction of moving intelligence
out of sbd. Not saying everything is perfect and no improvement
possible - of course ;-)

Klaus
>
> Regards,
> Ulrich
>
>>
>> Fri, 21/08/2020 в 13:16 -0500, Ken Gaillot wrote:
>>> Hi all,
>>>
>>> Looking ahead to the Pacemaker 2.0.5 release expected toward the end of
>>> this year, we will have improvements of interest to anyone running
>>> clusters with sbd.
>>>
>>> Previously at start-up, if sbd was blocked from contacting Pacemaker's
>>> CIB in a way that looked like pacemaker wasn't running (SELinux being a
>>> good example), pacemaker would run resources without protection from
>>> sbd. Now, if sbd is running, pacemaker will wait until sbd contacts it
>>> before it will start any resources, so the cluster is protected in this
>>> situation.
>>>
>>> Additionally, sbd will now periodically contact the main pacemaker
>>> daemon for a status report. Currently, this is just an immediate
>>> response, but it ensures that the main pacemaker daemon is responsive
>>> to IPC requests. This is a bit more assurance that pacemaker is not
>>> only running, but functioning properly. In future versions, we will
>>> have even more in-depth health checks as part of this feature.
>>>
>>> Previously at shutdown, sbd determined a clean pacemaker shutdown by
>>> checking whether any resources were running at shutdown. This would
>>> lead to sbd fencing if pacemaker shut down in maintenance mode with
>>> resources active. Now, sbd will determine clean shutdowns as part of
>>> the status report described above, avoiding that situation.
>>>
>>> These behaviors will be controlled by a new option in
>>> /etc/sysconfig/sbd or /etc/default/sbd, SBD_SYNC_RESOURCE_STARTUP. This
>>> defaults to "no" for backward compatibility when a newer sbd is used
>>> with an older pacemaker or vice versa. Distributions may change the
>>> value to "yes" since they can ensure both sbd and pacemaker versions
>>> support it; users who build their own installations can set it
>>> themselves if both versions support it.
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



More information about the Users mailing list