[ClusterLabs] Coming in 1.1.14: remapping sequential reboots to all-off-then-all-on

Mon Oct 19 12:34:58 EDT 2015

Pacemaker supports fencing "topologies", allowing multiple fencing
devices to be used (in conjunction or as fallbacks) when a node needs to
be fenced.

However, there is a catch when using something like redundant power
supplies. If you put two power switches in the same topology level, and
Pacemaker needs to reboot the node, it will reboot the first power
switch and then the second -- which has no effect since the supplies are
redundant.

Pacemaker's upstream master branch has new handling that will be part of
the eventual 1.1.14 release. In such a case, it will turn all the
devices off, then turn them all back on again.

With previous versions, there was a complicated configuration workaround
involving creating separate devices for the off and on actions. With the
new version, it happens automatically, and no special configuration is
needed.

Here's an example where node1 is the affected node, and apc1 and apc2
are the fence devices:

   pcs stonith level add 1 node1 apc1,apc2

Of course you can configure it using crm or XML as well.

The fencing operation will be treated as successful as long as the "off"
commands succeed, because then it is safe for the cluster to recover any
resources that were on the node. Timeouts and errors in the "on" phase
will be logged but ignored.

Any action-specific timeout for the remapped action will be used (for
example, pcmk_off_timeout will be used when executing the "off" command,
not pcmk_reboot_timeout).

The new code knows to skip the "on" step if the fence agent has
automatic unfencing (because it will happen when the node rejoins the
cluster). This allows fence_scsi to work with this feature.
-- 
Ken Gaillot <kgaillot at redhat.com>