[ClusterLabs] One Failed Resource = Failover the Cluster?

Fri Jun 4 17:49:11 EDT 2021

On Fri, 2021-06-04 at 19:10 +0000, Eric Robinson wrote:
> Sometimes it seems like Pacemaker fails over an entire cluster when
> only one resource has failed, even though no other resources are
> dependent on it. Is that expected behavior?
>  
> For example, suppose I have the following colocation constraints…
>  
> filesystem with drbd master
> vip with filesystem
> mysql_01 with filesystem
> mysql_02 with filesystem
> mysql_03 with filesystem

By default, a resource that is colocated with another resource will
influence that resource's location. This ensures that as many resources
are active as possible.

So, if any one of the above resources fails and meets its migration-
threshold, all of the resources will move to another node so a recovery
attempt can be made for the failed resource.

No resource will be *stopped* due to the failed resource unless it
depends on it.

As of the forthcoming 2.1.0 release, the new "influence" option for
colocation constraints (and "critical" resource meta-attribute)
controls whether this effect occurs. If influence is turned off (or the
resource made non-critical), then the failed resource will just stop,
and the other resources won't move to try to save it.

>  
> …and the following order constraints…
>  
> promote drbd, then start filesystem
> start filesystem, then start vip
> start filesystem, then start mysql_01
> start filesystem, then start mysql_02
> start filesystem, then start mysql_03
>  
> Now, if something goes wrong with mysql_02, will Pacemaker try to
> fail over the whole cluster? And if mysql_02 can’t be run on either
> cluster, then does Pacemaker refuse to run any resources?
>  
> I’m asking because I’ve seen some odd behavior like that over the
> years. Could be my own configuration mistakes, of course.
>  
> -Eric
-- 
Ken Gaillot <kgaillot at redhat.com>