<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Mar 26, 2021 at 6:27 AM Andrei Borzenkov <<a href="mailto:arvidjaar@gmail.com">arvidjaar@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, Mar 26, 2021 at 10:17 AM Ulrich Windl<br>
<<a href="mailto:Ulrich.Windl@rz.uni-regensburg.de" target="_blank">Ulrich.Windl@rz.uni-regensburg.de</a>> wrote:<br>
><br>
> >>> Andrei Borzenkov <<a href="mailto:arvidjaar@gmail.com" target="_blank">arvidjaar@gmail.com</a>> schrieb am 26.03.2021 um 06:19 in<br>
> Nachricht <<a href="mailto:534274b3-a6de-5fac-0ae4-d02c305f1a3f@gmail.com" target="_blank">534274b3-a6de-5fac-0ae4-d02c305f1a3f@gmail.com</a>>:<br>
> > On 25.03.2021 21:45, Reid Wahl wrote:<br>
> >> FWIW we have this KB article (I seem to remember Strahil is a Red Hat<br>
> >> customer):<br>
> >> - How do I configure SAP HANA Scale-Up System Replication in a Pacemaker<br>
> >> cluster when the HANA filesystems are on NFS shares?(<br>
> >> <a href="https://access.redhat.com/solutions/5156571" rel="noreferrer" target="_blank">https://access.redhat.com/solutions/5156571</a>)<br>
> >><br>
> ><br>
> > "How do I make the cluster resources recover when one node loses access<br>
> > to the NFS server?"<br>
> ><br>
> > If node loses access to NFS server then monitor operations for resources<br>
> > that depend on NFS availability will fail or timeout and pacemaker will<br>
> > recover (likely by rebooting this node). That's how similar<br>
> > configurations have been handled for the past 20 years in other HA<br>
> > managers. I am genuinely interested, have you encountered the case where<br>
> > it was not enough?<br>
><br>
> That's a big problem with the SAP design (basically it's just too complex).<br>
> In the past I had written a kind of resource agent that worked without that<br>
> overly complex overhead, but since those days SAP has added much more<br>
> complexity.<br>
> If the NFS server is external, pacemaker could fence your nodes when the NFS<br>
> server is down as first the monitor operation will fail (hanging on NFS), the<br>
> the recover (stop/start) will fail (also hanging on NFS).<br>
<br>
And how exactly placing NFS resource under pacemaker control is going<br>
to change it?<br></blockquote><div><br></div><div>I noted earlier based on the old case notes:<br></div><div><br></div><div>"Apparently there were situations in which the SAPHana resource wasn't
failing over when connectivity was lost with the NFS share that
contained the hdb* binaries and the HANA data. I don't remember the
exact details (whether demotion was failing, or whether it wasn't even
trying to demote on the primary and promote on the secondary, or what).
Either way, I was surprised that this procedure was necessary, but it
seemed to be."</div><div><br></div><div>Strahil may be dealing with a similar situation, not sure. I get where you're coming from -- I too would expect the application that depends on NFS to simply fail when NFS connectivity is lost, which in turn leads to failover and recovery. For whatever reason, due to some weirdness of the SAPHana resource agent, that didn't happen.<br></div><div><br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> Even when fencing the<br>
> node it would not help (resources cannot start) if the NFS server is still<br>
> down.<br>
<br>
And how exactly placing NFS resource under pacemaker control is going<br>
to change it?<br>
<br>
> So you may end up with all your nodes being fenced and the fail counts<br>
> disabling any automatic resource restart.<br>
><br>
<br>
And how exactly placing NFS resource under pacemaker control is going<br>
to change it?<br>
_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>
<br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div>Regards,<br><br></div>Reid Wahl, RHCA<br></div><div>Senior Software Maintenance Engineer, Red Hat<br></div>CEE - Platform Support Delivery - ClusterHA</div></div></div></div></div></div></div></div></div></div></div></div></div></div></div>