[ClusterLabs] SBD & Failed Peer

Tue Sep 8 10:13:40 EDT 2015

On 09/07/2015 07:48 AM, Jorge Fábregas wrote:
> On 09/07/2015 03:27 AM, Digimer wrote:
>> And this is why I am nervous; It is always ideal to have a primary fence
>> method that has a method of confirming the 'off' state. IPMI fencing can
>> do this, as can hypervisor-based fence methods like fence_virsh and
>> fence_xvm.
> 
> Hi Digimer,
> 
> Yes, I thought that confirmation was kind of sacred but now I know it's
> not always possible.
> 
>> I would use IPMI (iLO, DRAC, etc) as the primary fence method and
>> something else as a secondary, backup method. You can use SBD + watchdog
>> as the backup method, or as I do, a pair of switched PDUs (I find APC
>> brand to be very fast in fencing).
> 
> This sounds great.  Is there a way to specify a primary & secondary
> fencing device?  I haven't seen a way to specify such hierarchy in
> pacemaker.

Good news/bad news:

Yes, pacemaker supports complex hierarchies of multiple fencing devices,
which it calls "fencing topology". There is a small example at
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_advanced_stonith_configurations

Unfortunately, sbd is not supported in fencing topologies. Pacemaker
hooks into sbd via dedicated internal logic, not a conventional fence
agent, so it's treated differently. You might want to open an RFE bug
either upstream or with your OS vendor if you want to put it on the
radar, but sbd isn't entirely under Pacemaker's control, so I'm not sure
how feasible it would be.