[ClusterLabs] Antw: Re: Why Do All The Services Go Down When Just One Fails?

Thu Feb 21 00:03:17 EST 2019

> -----Original Message-----
> From: Users <users-bounces at clusterlabs.org> On Behalf Of Andrei
> Borzenkov
> Sent: Wednesday, February 20, 2019 8:51 PM
> To: users at clusterlabs.org
> Subject: Re: [ClusterLabs] Antw: Re: Why Do All The Services Go Down When
> Just One Fails?
> 
> 20.02.2019 21:51, Eric Robinson пишет:
> >
> > The following should show OK in a fixed font like Consolas, but the
> following setup is supposed to be possible, and is even referenced in the
> ClusterLabs documentation.
> >
> >
> >
> >
> >
> > +--------------+
> >
> > |   mysql001   +--+
> >
> > +--------------+  |
> >
> > +--------------+  |
> >
> > |   mysql002   +--+
> >
> > +--------------+  |
> >
> > +--------------+  |   +-------------+   +------------+   +----------+
> >
> > |   mysql003   +----->+ floating ip +-->+ filesystem +-->+ blockdev |
> >
> > +--------------+  |   +-------------+   +------------+   +----------+
> >
> > +--------------+  |
> >
> > |   mysql004   +--+
> >
> > +--------------+  |
> >
> > +--------------+  |
> >
> > |   mysql005   +--+
> >
> > +--------------+
> >
> >
> >
> > In the layout above, the MySQL instances are dependent on the same
> underlying service stack, but they are not dependent on each other.
> Therefore, as I understand it, the failure of one MySQL instance should not
> cause the failure of other MySQL instances if on-fail=ignore on-fail=stop. At
> least, that’s the way it seems to me, but based on the thread, I guess it does
> not behave that way.
> >
> 
> This works this way for monitor operation if you set on-fail=block.
> Failed resource is left "as is". The only case when it does not work seems to
> be stop operation; even with explicit on-fail=block it still attempts to initiate
> follow up actions. I still consider this a bug.
> 
> If this is not a bug, this needs clear explanation in documentation.
> 
> But please understand that assuming on-fail=block works you effectively
> reduce your cluster to controlled start of resources during boot. As we have

Or failover, correct?

> seen, stopping of resource IP is blocked, meaning pacemaker also cannot
> perform resource level recovery at all. And for mysql resources you explicitly
> ignore any result of monitoring or failure to stop it.
> And not having stonith also prevents pacemaker from handling node failure.
> What leaves is at most restart of resources on another node during graceful
> shutdown.
> 
> It begs a question - what do you need such "cluster" for at all?

Mainly to manage the other relevant resources: drbd, filesystem, and floating IP. I'm content to forego resource level recovery for MySQL services and monitor their health from outside the cluster and remediate them manually if necessary. I don't see an option if I want to avoid the sort of deadlock situation we talked about earlier. 

> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org