[ClusterLabs] SBD & Failed Peer

Tue Sep 8 19:11:51 EDT 2015

On 09/08/2015 05:33 PM, Andrew Beekhof wrote:
> 
>> On 9 Sep 2015, at 12:13 am, Ken Gaillot <kgaillot at redhat.com> wrote:
>>
>> On 09/07/2015 07:48 AM, Jorge Fábregas wrote:
>>> On 09/07/2015 03:27 AM, Digimer wrote:
>>>> And this is why I am nervous; It is always ideal to have a primary fence
>>>> method that has a method of confirming the 'off' state. IPMI fencing can
>>>> do this, as can hypervisor-based fence methods like fence_virsh and
>>>> fence_xvm.
>>>
>>> Hi Digimer,
>>>
>>> Yes, I thought that confirmation was kind of sacred but now I know it's
>>> not always possible.
>>>
>>>> I would use IPMI (iLO, DRAC, etc) as the primary fence method and
>>>> something else as a secondary, backup method. You can use SBD + watchdog
>>>> as the backup method, or as I do, a pair of switched PDUs (I find APC
>>>> brand to be very fast in fencing).
>>>
>>> This sounds great.  Is there a way to specify a primary & secondary
>>> fencing device?  I haven't seen a way to specify such hierarchy in
>>> pacemaker.
>>
>> Good news/bad news:
>>
>> Yes, pacemaker supports complex hierarchies of multiple fencing devices,
>> which it calls "fencing topology". There is a small example at
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_advanced_stonith_configurations
>>
>> Unfortunately, sbd is not supported in fencing topologies.
> 
> Another way to look at it, is that sbd is only supported in fencing topologies - just not explicit ones.
> Self-termination is always the least best option, so we’ll only use it if all other options (including topologies) are exhausted.
> But we’ll do so automatically.

Ah, that's a better situation than I realized.

In that case, it would be easy to add a primary fencing device (or
multiple devices in a topology), and enable sbd, to create the same
effect as having sbd as the last level in a topology. Sbd just wouldn't
be explicitly listed in a topology configuration.

>> Pacemaker
>> hooks into sbd via dedicated internal logic, not a conventional fence
>> agent, so it's treated differently. You might want to open an RFE bug
>> either upstream or with your OS vendor if you want to put it on the
>> radar, but sbd isn't entirely under Pacemaker's control, so I'm not sure
>> how feasible it would be.