[ClusterLabs] One Failed Resource = Failover the Cluster?

Mon Jun 7 15:34:48 EDT 2021

On Sat, 2021-06-05 at 20:33 +0000, Eric Robinson wrote:
> > -----Original Message-----
> > From: Users <users-bounces at clusterlabs.org> On Behalf Of
> > kgaillot at redhat.com
> > Sent: Friday, June 4, 2021 4:49 PM
> > To: Cluster Labs - All topics related to open-source clustering
> > welcomed
> > <users at clusterlabs.org>
> > Subject: Re: [ClusterLabs] One Failed Resource = Failover the
> > Cluster?
> > 
> > On Fri, 2021-06-04 at 19:10 +0000, Eric Robinson wrote:
> > > Sometimes it seems like Pacemaker fails over an entire cluster
> > > when
> > > only one resource has failed, even though no other resources are
> > > dependent on it. Is that expected behavior?
> > > 
> > > For example, suppose I have the following colocation constraints…
> > > 
> > > filesystem with drbd master
> > > vip with filesystem
> > > mysql_01 with filesystem
> > > mysql_02 with filesystem
> > > mysql_03 with filesystem
> > 
> > By default, a resource that is colocated with another resource will
> > influence
> > that resource's location. This ensures that as many resources are
> > active as
> > possible.
> > 
> > So, if any one of the above resources fails and meets its
> > migration- threshold,
> > all of the resources will move to another node so a recovery
> > attempt can be
> > made for the failed resource.
> > 
> > No resource will be *stopped* due to the failed resource unless it
> > depends
> > on it.
> > 
> 
> Thanks, but I'm confused by your previous two paragraphs. On one
> hand, "if any one of the above resources fails and meets its
> migration- threshold, all of the resources will move to another
> node." Obviously moving resources requires stopping them. But then,
> "No resource will be *stopped* due to the failed resource unless it
> depends on it." Those two statements seem contradictory to me. Not
> trying to be argumentative. Just trying to understand.

Right, I should have said "will be left stopped". I.e., the other
resources might stop and start as part of a move, but they're not going
to stop and stay stopped because something that depends on them failed.

> 
> > As of the forthcoming 2.1.0 release, the new "influence" option for
> > colocation constraints (and "critical" resource meta-attribute)
> > controls
> > whether this effect occurs. If influence is turned off (or the
> > resource made
> > non-critical), then the failed resource will just stop, and the
> > other resources
> > won't move to try to save it.
> > 
> 
> That sounds like the feature I'm waiting for. In the example
> configuration I provided, I would not want the failure of any mysql
> instance to cause cluster failover. I would only want the cluster to
> failover if the filesystem or drbd resources failed. Basically, if a
> resource breaks or fails to stop, I don't want the whole cluster to
> failover if nothing depends on that resource. Just let it stay down
> until someone can manually intervene. But if an underlying resource
> fails that everything else is dependent on (drbd or filesystem) then
> go ahead and failover the cluster.
> 
> > > 
> > > …and the following order constraints…
> > > 
> > > promote drbd, then start filesystem
> > > start filesystem, then start vip
> > > start filesystem, then start mysql_01
> > > start filesystem, then start mysql_02
> > > start filesystem, then start mysql_03
> > > 
> > > Now, if something goes wrong with mysql_02, will Pacemaker try to
> > > fail
> > > over the whole cluster? And if mysql_02 can’t be run on either
> > > cluster, then does Pacemaker refuse to run any resources?
> > > 
> > > I’m asking because I’ve seen some odd behavior like that over the
> > > years. Could be my own configuration mistakes, of course.
> > > 
> > > -Eric
> > 
> > --
> > Ken Gaillot <kgaillot at redhat.com>
> > 
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> 
> Disclaimer : This email and any files transmitted with it are
> confidential and intended solely for intended recipients. If you are
> not the named addressee you should not disseminate, distribute, copy
> or alter this email. Any views or opinions presented in this email
> are solely those of the author and might not represent those of
> Physician Select Management. Warning: Although Physician Select
> Management has taken reasonable precautions to ensure no viruses are
> present in this email, the company cannot accept responsibility for
> any loss or damage arising from the use of this email or attachments.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>