[ClusterLabs] Completely disabled resource failure triggered fencing

Sun Jan 17 13:45:12 EST 2021

Hi all,

  I'm trying to figure out how to define a resource such that if it
fails in any way, it will not cause pacemaker self self-fence. The
reasoning being that there are relatively minor ways to fault a single
resource (these are VMs, so for example, a bad edit to the XML
definition renders it invalid, or the definition is accidentally removed).

In a case like this, I fully expect that resource to enter a failed
state. Of course, pacemaker won't be able to stop it, migrate it, etc.
When this happens currently, it causes the host to self-fence, taking
down all other hosted resources (servers). This is less than ideal.

Is there a way to tell pacemaker that if it's unable to manage a
resource, it flags it as failed and leaves it at that? I've been trying
to do this and my config so far is;

pcs resource create srv07-el6 ocf:alteeve:server name="srv07-el6" \
 meta allow-migrate="true" target-role="stopped" \
 op monitor interval="60" start timeout="INFINITY" \
 on-fail="block" stop timeout="INFINITY" on-fail="block" \
 migrate_to timeout="INFINITY"

This is getting cumbersome and still, in testing, I'm finding cases
where the node gets fenced when something breaks the resource in a
creative way.

Thanks for any insight/guidance!

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould