[ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely
Lars Ellenberg
lars.ellenberg at linbit.com
Thu Sep 22 09:55:29 CEST 2016
On Thu, Sep 22, 2016 at 08:01:44AM +0200, Klaus Wenninger wrote:
> On 09/22/2016 06:34 AM, renayama19661014 at ybb.ne.jp wrote:
> > Hi Klaus,
> >
> > Thank you for comment.
> >
> > Okay!
> >
> > Will it mean that improvement is considered in community in future?
>
> Speaking for me I'd like to have some feedback if we might
> have overseen something so that it is rather a config issue.
>
> One of my current projects is to introduce improved
> observation of pacemaker_remoted by sbd. (Saying improved
> here because there is already something when you enable
> pacemaker-watcher on remote-nodes but it creates unneeded
> watchdog-reboots in a couple of cases ...)
> Looks as if some additional (direct) communication (heartbeat -
> the principle not the communication & membership for
> clusters) between pacemaker_remoted (very similar to lrmd)
> and sbd would come handy for that.
>
> So in this light it might make sense
> to consider expanding that for crmd as well ...
>
> If we are finally facing an issue I'd herewith like to ask for
> input.
In a somewhat extended context, there used to be "apphbd",
which itself would register with some watchdog to "monitor" itself,
and which "applications" would register with to negotiate their own
"application heartbeat".
Not neccessarily only components of the cluster manager,
but cluster aware "resources" as well.
If they fail to feed their app hb, apphbd would then "trigger a
notification", and some other entity would react on that based on
yet an other configuration. And plugins.
Didn't old heartbeat like the concept of plugins...
Anyways, you get the idea.
Currently, we have SBD chosen as such a "watchdog proxy",
maybe we can generalize it?
All of that would require cooperation within the node itself, though.
In this scenario, the cluster is not trusting the "sanity"
of the "commander in chief".
So maybe in addition of this "in-node application heartbeat",
all non-DCs should periodically actively challenge the sanity
of the DC from the outside, and trigger re-election if they have
"reasonable doubt"?
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support
DRBD® and LINBIT® are registered trademarks of LINBIT
More information about the Users
mailing list