[ClusterLabs] One Failed Resource = Failover the Cluster?

Sat Jun 5 16:33:44 EDT 2021

> -----Original Message-----
> From: Users <users-bounces at clusterlabs.org> On Behalf Of
> kgaillot at redhat.com
> Sent: Friday, June 4, 2021 4:49 PM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>
> Subject: Re: [ClusterLabs] One Failed Resource = Failover the Cluster?
>
> On Fri, 2021-06-04 at 19:10 +0000, Eric Robinson wrote:
> > Sometimes it seems like Pacemaker fails over an entire cluster when
> > only one resource has failed, even though no other resources are
> > dependent on it. Is that expected behavior?
> >
> > For example, suppose I have the following colocation constraints…
> >
> > filesystem with drbd master
> > vip with filesystem
> > mysql_01 with filesystem
> > mysql_02 with filesystem
> > mysql_03 with filesystem
>
> By default, a resource that is colocated with another resource will influence
> that resource's location. This ensures that as many resources are active as
> possible.
>
> So, if any one of the above resources fails and meets its migration- threshold,
> all of the resources will move to another node so a recovery attempt can be
> made for the failed resource.
>
> No resource will be *stopped* due to the failed resource unless it depends
> on it.
>

Thanks, but I'm confused by your previous two paragraphs. On one hand, "if any one of the above resources fails and meets its migration- threshold, all of the resources will move to another node." Obviously moving resources requires stopping them. But then, "No resource will be *stopped* due to the failed resource unless it depends on it." Those two statements seem contradictory to me. Not trying to be argumentative. Just trying to understand.

> As of the forthcoming 2.1.0 release, the new "influence" option for
> colocation constraints (and "critical" resource meta-attribute) controls
> whether this effect occurs. If influence is turned off (or the resource made
> non-critical), then the failed resource will just stop, and the other resources
> won't move to try to save it.
>

That sounds like the feature I'm waiting for. In the example configuration I provided, I would not want the failure of any mysql instance to cause cluster failover. I would only want the cluster to failover if the filesystem or drbd resources failed. Basically, if a resource breaks or fails to stop, I don't want the whole cluster to failover if nothing depends on that resource. Just let it stay down until someone can manually intervene. But if an underlying resource fails that everything else is dependent on (drbd or filesystem) then go ahead and failover the cluster.

> >
> > …and the following order constraints…
> >
> > promote drbd, then start filesystem
> > start filesystem, then start vip
> > start filesystem, then start mysql_01
> > start filesystem, then start mysql_02
> > start filesystem, then start mysql_03
> >
> > Now, if something goes wrong with mysql_02, will Pacemaker try to fail
> > over the whole cluster? And if mysql_02 can’t be run on either
> > cluster, then does Pacemaker refuse to run any resources?
> >
> > I’m asking because I’ve seen some odd behavior like that over the
> > years. Could be my own configuration mistakes, of course.
> >
> > -Eric
> --
> Ken Gaillot <kgaillot at redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.