[ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Lars Ellenberg lars.ellenberg at linbit.com
Thu Sep 22 07:55:29 UTC 2016


On Thu, Sep 22, 2016 at 08:01:44AM +0200, Klaus Wenninger wrote:
> On 09/22/2016 06:34 AM, renayama19661014 at ybb.ne.jp wrote:
> > Hi Klaus,
> >
> > Thank you for comment.
> >
> > Okay!
> >
> > Will it mean that improvement is considered in community in future?
> 
> Speaking for me I'd like to have some feedback if we might
> have overseen something so that it is rather a config issue.
> 
> One of my current projects is to introduce improved
> observation of pacemaker_remoted by sbd. (Saying improved
> here because there is already something when you enable
> pacemaker-watcher on remote-nodes but it creates unneeded
> watchdog-reboots in a couple of cases ...)
> Looks as if some additional (direct) communication (heartbeat -
> the principle not the communication & membership for
> clusters) between pacemaker_remoted (very similar to lrmd)
> and sbd would come handy for that.
> 
> So in this light it might make sense
> to consider expanding that for crmd as well ...
> 
> If we are finally facing an issue I'd herewith like to ask for
> input.

In a somewhat extended context, there used to be "apphbd",
which itself would register with some watchdog to "monitor" itself,
and which "applications" would register with to negotiate their own
"application heartbeat".
Not neccessarily only components of the cluster manager,
but cluster aware "resources" as well.

If they fail to feed their app hb, apphbd would then "trigger a
notification", and some other entity would react on that based on
yet an other configuration.  And plugins.
Didn't old heartbeat like the concept of plugins...

Anyways, you get the idea.

Currently, we have SBD chosen as such a "watchdog proxy",
maybe we can generalize it?

All of that would require cooperation within the node itself, though.

In this scenario, the cluster is not trusting the "sanity"
of the "commander in chief".

So maybe in addition of this "in-node application heartbeat",
all non-DCs should periodically actively challenge the sanity
of the DC from the outside, and trigger re-election if they have
"reasonable doubt"?


-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT




More information about the Users mailing list