[ClusterLabs] Coming in 1.1.14: remapping sequential reboots to all-off-then-all-on

Digimer lists at alteeve.ca
Mon Oct 19 16:42:42 UTC 2015


On 19/10/15 12:34 PM, Ken Gaillot wrote:
> Pacemaker supports fencing "topologies", allowing multiple fencing
> devices to be used (in conjunction or as fallbacks) when a node needs to
> be fenced.
> 
> However, there is a catch when using something like redundant power
> supplies. If you put two power switches in the same topology level, and
> Pacemaker needs to reboot the node, it will reboot the first power
> switch and then the second -- which has no effect since the supplies are
> redundant.
> 
> Pacemaker's upstream master branch has new handling that will be part of
> the eventual 1.1.14 release. In such a case, it will turn all the
> devices off, then turn them all back on again.

How long will it leave stay in the 'off' state? Is it configurable? I
ask because if it's too short, some PSUs may not actually lose power.
One or two seconds should be way more than enough though.

> With previous versions, there was a complicated configuration workaround
> involving creating separate devices for the off and on actions. With the
> new version, it happens automatically, and no special configuration is
> needed.
> 
> Here's an example where node1 is the affected node, and apc1 and apc2
> are the fence devices:
> 
>    pcs stonith level add 1 node1 apc1,apc2

Where would the outlet definition go? 'apc1:4,apc2:4'?

> Of course you can configure it using crm or XML as well.
> 
> The fencing operation will be treated as successful as long as the "off"
> commands succeed, because then it is safe for the cluster to recover any
> resources that were on the node. Timeouts and errors in the "on" phase
> will be logged but ignored.
> 
> Any action-specific timeout for the remapped action will be used (for
> example, pcmk_off_timeout will be used when executing the "off" command,
> not pcmk_reboot_timeout).

I think this answers my question about how long it stays off for. What
would be an example config to control the off time then?

> The new code knows to skip the "on" step if the fence agent has
> automatic unfencing (because it will happen when the node rejoins the
> cluster). This allows fence_scsi to work with this feature.

http://i.imgur.com/i7BzivK.png

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Users mailing list