[ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Re: Order set troubles

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Mar 29 02:53:11 EDT 2021


>>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 26.03.2021 um 14:26 in
Nachricht
<CAA91j0VsKq9SnUuKL5mkvq0A7z_B9udvags-t9zUkVzGDrRSDw at mail.gmail.com>:
> On Fri, Mar 26, 2021 at 10:17 AM Ulrich Windl
> <Ulrich.Windl at rz.uni‑regensburg.de> wrote:
>>
>> >>> Andrei Borzenkov <arvidjaar at gmail.com> schrieb am 26.03.2021 um 06:19
in
>> Nachricht <534274b3‑a6de‑5fac‑0ae4‑d02c305f1a3f at gmail.com>:
>> > On 25.03.2021 21:45, Reid Wahl wrote:
>> >> FWIW we have this KB article (I seem to remember Strahil is a Red Hat
>> >> customer):
>> >>   ‑ How do I configure SAP HANA Scale‑Up System Replication in a
Pacemaker
>> >> cluster when the HANA filesystems are on NFS shares?(
>> >> https://access.redhat.com/solutions/5156571)
>> >>
>> >
>> > "How do I make the cluster resources recover when one node loses access
>> > to the NFS server?"
>> >
>> > If node loses access to NFS server then monitor operations for resources
>> > that depend on NFS availability will fail or timeout and pacemaker will
>> > recover (likely by rebooting this node). That's how similar
>> > configurations have been handled for the past 20 years in other HA
>> > managers. I am genuinely interested, have you encountered the case where
>> > it was not enough?
>>
>> That's a big problem with the SAP design (basically it's just too
complex).
>> In the past I had written a kind of resource agent that worked without
that
>> overly complex overhead, but since those days SAP has added much more
>> complexity.
>> If the NFS server is external, pacemaker could fence your nodes when the
NFS
>> server is down as first the monitor operation will fail (hanging on NFS), 
> the
>> the recover (stop/start) will fail (also hanging on NFS).
> 
> And how exactly placing NFS resource under pacemaker control is going
> to change it?

Actively maybe: Check reachability of the NFS server (local or remote); if
it's not reachable, block all RA operations that would hang while NFS is down.
(Basically a "freeze" isntead of a "recover" when NFS is down)

> 
>> Even when fencing the
>> node it would not help (resources cannot start) if the NFS server is still
>> down.
> 
> And how exactly placing NFS resource under pacemaker control is going
> to change it?

See above.

> 
>> So you may end up with all your nodes being fenced and the fail counts
>> disabling any automatic resource restart.
>>
> 
> And how exactly placing NFS resource under pacemaker control is going
> to change it?

Andrei, is there also another sentence you can say, or is that your favorite
clicpboard message? ;-)

Regards,
Ulrich

> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list