[ClusterLabs] Completely disabled resource failure triggered fencing

Mon Jan 18 08:01:15 EST 2021

Have you tried on-fail=ignore option ?

Best Regards,
Strahil Nikolov

В неделя, 17 януари 2021 г., 20:45:27 Гринуич+2, Digimer <lists at alteeve.ca> написа: 

Hi all,

  I'm trying to figure out how to define a resource such that if it
fails in any way, it will not cause pacemaker self self-fence. The
reasoning being that there are relatively minor ways to fault a single
resource (these are VMs, so for example, a bad edit to the XML
definition renders it invalid, or the definition is accidentally removed).

In a case like this, I fully expect that resource to enter a failed
state. Of course, pacemaker won't be able to stop it, migrate it, etc.
When this happens currently, it causes the host to self-fence, taking
down all other hosted resources (servers). This is less than ideal.

Is there a way to tell pacemaker that if it's unable to manage a
resource, it flags it as failed and leaves it at that? I've been trying
to do this and my config so far is;

pcs resource create srv07-el6 ocf:alteeve:server name="srv07-el6" \
meta allow-migrate="true" target-role="stopped" \
op monitor interval="60" start timeout="INFINITY" \
on-fail="block" stop timeout="INFINITY" on-fail="block" \
migrate_to timeout="INFINITY"

This is getting cumbersome and still, in testing, I'm finding cases
where the node gets fenced when something breaks the resource in a
creative way.

Thanks for any insight/guidance!

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/