[Pacemaker] Human confirmation of dead node?

Tue Oct 13 16:43:53 UTC 2009

Hi,

On Tue, Oct 13, 2009 at 05:57:25PM +0200, J Brack wrote:
> On 10/13/09, Dejan Muhamedagic <dejanmm at fastmail.fm> wrote:
> > Hi,
> >
> > On Tue, Oct 13, 2009 at 03:23:11PM +0200, J Brack wrote:
> >> Hi,
> >>
> >> I'm currently using heartbeat. I heard that I'm meant to be using
> >> pacemaker. I will switch in a heartbeat (sorry) if I can get pacemaker
> >> to do what I need.
> >
> > http://clusterlabs.org/wiki/Project_History
> >
> >> I have a clustered nfs server, primary is in datacenter1 close to the
> >> users, secondary is in datacenter2 not close to the users. There is
> >> only an ethernet connection between the two data centers.
> >>
> >> In the event of a failure of the primary in datacenter1 (or of
> >> datacenter1 itself), I would like to switch to the secondary in
> >> datacenter2. The catch? I want a human to confirm that the primary is
> >> really dead.
> >>
> >> My current heartbeat setup uses meatclient to confirm that a node has
> >> been reset. This happens to do the same thing as confirming primary is
> >> really dead for when primary's hardware dies - but for a network
> >> outage I see the service bounce between the servers after the network
> >> comes back up again. This is not ideal. I'm kind of hoping the
> >> pacemaker can handle this more gracefully.
> >
> > It can't. The meatware/meatclient combination replaces a fencing
> > operation. It is even expected that the node fenced is going to
> > come up after a while.
> >
> >> Can pacemaker be configured to allow manual (human) confirmation that
> >> the primary node is dead before ever switching services? (i.e. requrie
> >> human confirmation for all cases when it cannot talk to the other
> >> node).
> >
> > If your network goes yo-yo, the cluster will follow. The only
> > way is to remove a node from the configuration or put it into
> > standby.
> 
> What is the reasoning for this though?

Well, how else would you have it work? The point is that as soon
as there is network connectivity the nodes will try to reform a
cluster.

> Here I have pri and sec, both with meatware.
> 
> My expectiation:
> Network dies, pri stays primary, sec waits for confirmation that pri
> is dead. It never gets it.
> Network comes back, sec sees pri is primary. All is well with the world.
> 
> What really happens.
> Same, but when the network comes back, sec gets pri's resources, then
> pri gets them back again.
> 
> This seems wrong.

Indeed. That shouldn't happen. If it does, please file a
bugzilla.

Thanks,

Dejan

> 
> _______________________________________________
> Pacemaker mailing list
> Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker