[ClusterLabs] SAPHanaController & SAPHanaTopology question

Fri Apr 2 17:04:39 EDT 2021

Hi Reid,
I will check it out in Monday, but I'm pretty sure I created an order set that first stops the topology and only then it stops the nfs-active.
Yet, I made the stupid decision to prevent ocf:heartbeat:Filesystem (and setting a huge timeout for the stop operation) from killing those 2 SAP processes which led to 'I can't umount, giving up'-like notification and of course fenced the entire cluster :D . 
Note taken, stonith has now different delays , and Filesystem can kill the processes.
As per the SAP note from Andrei, these could really be 'fast restart' mechanisms in HANA 2.0 and it looks safe to be killed (will check with SAP about that).

P.S: Is there a way to remove a whole set in pcs , cause it's really irritating when the stupid command wipes the resource from multiple order constraints?
Best Regards,Strahil Nikolov

  On Fri, Apr 2, 2021 at 23:44, Reid Wahl<nwahl at redhat.com> wrote:   Hi, Strahil.
Based on the constraints documented in the article you're following (RH KB solution 5423971), I think I see what's happening.
The SAPHanaTopology resource requires the appropriate nfs-active attribute in order to run. That means that if the nfs-active attribute is set to false, the SAPHanaTopology resource must stop.
However, there's no rule saying SAPHanaTopology must finish stopping before the nfs-active attribute resource stops. In fact, it's quite the opposite: the SAPHanaTopology resource stops only after the nfs-active resource stops.
At the same time, the NFS resources are allowed to stop after the nfs-active attribute resource has stopped. So the NFS resources are stopping while the SAPHana* resources are likely still active.
Try something like this:    # pcs constraint order hana_nfs1_active-clone then SAPHanaTopology_<SID>_<instance_num>-clone kind=Optional
    # pcs constraint order hana_nfs2_active-clone then SAPHanaTopology_<SID>_<instance_num>-clone kind=Optional

This says "if both hana_nfs1_active and SAPHanaTopology are scheduled to start, then make hana_nfs1_active start first. If both are scheduled to stop, then make SAPHanaTopology stop first."
"kind=Optional" means there's no order dependency unless both resources are already going to be scheduled for the action. I'm using kind=Optional here even though kind=Mandatory (the default) would make sense, because IIRC there were some unexpected interactions with ordering constraints for clones, where events on one node had unwanted effects on other nodes.
I'm not able to test right now since setting up an environment for this even with dummy resources is non-trivial -- but you're welcome to try this both with and without kind=Optional if you'd like.
Please let us know how this goes.

On Fri, Apr 2, 2021 at 2:20 AM Strahil Nikolov <hunter86_bg at yahoo.com> wrote:

Hello All,
I am testing the newly built HANA (Scale-out) cluster and it seems that:Neither SAPHanaController, nor SAPHanaTopology are stopping the HANA when I put the nodes (same DC = same HANA) in standby. This of course leads to a situation where the NFS cannot be umounted and despite the stop timeout  - leads to fencing(on-fail=fence).
I thought that the Controller resource agent is stopping the HANA and the slave role should not be 'stopped' before that .
Maybe my expectations are wrong ?
Best Regards,Strahil Nikolov
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20210402/3a937b29/attachment.htm>