[ClusterLabs] Wtrlt: Antw: Re: Antw: Re: how important would you consider to have two independent fencing device for each node ?

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Apr 20 02:43:52 EDT 2017


Should have gone to the list...

>>>> Digimer <lists at alteeve.ca> schrieb am 19.04.2017 um 17:20 in Nachricht
> <600637f1-fef8-0a3d-821c-7aecfa398ee2 at alteeve.ca>:
> > On 19/04/17 02:38 AM, Ulrich Windl wrote:
> >>>>> Digimer <lists at alteeve.ca> schrieb am 18.04.2017 um 19:08 in
Nachricht
> >> <26e49390-b384-b46e-4965-eba5bfe59636 at alteeve.ca>:
> >>> On 18/04/17 11:07 AM, Lentes, Bernd wrote:
> >>>> Hi,
> >>>>
> >>>> i'm currently establishing a two node cluster. Each node is a HP
server
> >> with 
> >>> an ILO card.
> >>>> I can fence both of them, it's working fine.
> >>>> But what is if the ILO does not work correctly ? Then fencing is not 
> >>> possible.
> >>>
> >>> Correct. If you only have iLO fencing, then the cluster would hang
> >>> (failed fencing is *not* an indication of node death).
> >>>
> >>>> I also have a switched PDU from APC. Each server has two power
supplies. 
> >>> Currently one is connected to the normal power equipment, the other to
the 
> >>> UPS.
> >>>> As a sort of redundancy, if the UPS does not work properly.
> >>>
> >>> That's a fine setup.
> >>>
> >>>> When i'd like to use the switched PDU as a fencing device i will loose
the
> >> 
> >>> redundancy of two independent power sources, because then i have to
connect
> >> 
> >>> both power supplies together to the UPS.
> >>>> I wouldn't like to do that.
> >>>
> >>> Not if you have two switched PDUs. This is what we do in our Anvil!
> >>> systems... One PDU feeds the first PSU in each node and the second PDU
> >>> feeds the second PSUs. Ideally both PDUs are fed by UPSes, but that's
> >>> not as important. One PDU on a UPS and one PDU directly from mains will
> >>> work.
> >>>
> >>>> How important would you consider to have two independent fencing device
for
> >> 
> >>> each node ? I'd can't by another PDU, currently we are very poor.
> >>>
> >>> Depends entirely on your tolerance for interruption. *I* answer that
> >>> with "extremely important". However, most clusters out there have only
> >>> IPMI-based fencing, so they would obviously say "not so important".
> >>>
> >>>> Is there another way to create a second fencing device, independent
from
> >> the 
> >>> ILO card ?
> >>>>
> >>>> Thanks.
> >>>
> >>> Sure, SBD would work. I've never seen IPMI not have a watchdog timer
> >>> (and iLO is IPMI++), as one example. It's slow, and needs shared
> >>> storage, but a small box somewhere running a small tgtd or iscsid
should
> >>> do the trick (note that I have never used SBD myself...).
> >> 
> >> Slow is relative: If it takes 3 seconds from issuing the reset command
until
> >> the node is dead, it's fast enough for most cases. Even a switched PDU
has 
> > some
> >> delays: The command has to be processed, the relay may "stick" a short 
> > moment,
> >> the power supply's capacitors have to discharge (if you have two power 
> > supplys,
> >> both need to)...  And iLOs don't really like to be powered off.
> >> 
> >> Ulrich
> > 
> > The way I understand SBD, and correct me if I am wrong, recovery won't
> > begin until sometime after the watchdog timer kicks. If the watchdog
> > timer is 60 seconds, then your cluster will hang for >60 seconds (plus
> > fence delays, etc).
> 
> I think it works differently: One task periodically reads ist mailbox slot 
> for commands, and once a comment was read, it's executed immediately. Only
if 
> the read task does hang for a long time, the watchdog itself triggers a
reset 
> (as SBD seems dead). So the delay is actually made from the sum of "write 
> delay", "read delay", "command excution".
> 
> The manual page (LSES 11 SP4) states: "Set watchdog timeout to N seconds. 
> This depends mostly on your storage latency; the majority of devices must be

> successfully read within this time, or else the node will self-fence." and 
> "If a watchdog is used together with the "sbd" as is strongly recommended, 
> the watchdog is activated at initial start of the sbd daemon. The watchdog
is 
> refreshed every time the majority of SBD devices has been successfully read.

> Using a watchdog provides additional protection against "sbd" crashing."
> 
> Final remark: I thing the developers of sbd were under drugs (or never saw a

> UNIX program before) when designing the options. For example: "-W  Enable or

> disable use of the system watchdog to protect against the sbd processes 
> failing and the node being left in an undefined state. Specify this once to

> enable, twice to disable." (MHO)
> 
> Regards,
> Ulrich
> 
> > 
> > IPMI and PDUs can confirm fence the peer if ~5 seconds (plus fence
delays).
> > 
> > -- 
> > Digimer
> > Papers and Projects: https://alteeve.com/w/ 
> > "I am, somehow, less interested in the weight and convolutions of
> > Einstein’s brain than in the near certainty that people of equal talent
> > have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
> 
> 
> 
> 







More information about the Users mailing list