[ClusterLabs] SAP HANA monitor fails - Error performing operation: No such device or address

Fri Apr 8 09:38:27 EDT 2022

On Fri, 2022-04-08 at 17:17 +0800, Aj Revelino wrote:
> Hello All,
> I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker
> monitors the data replication between the primary and the secondary
> node. The issue is that crm status shows that everything is okay but
> the system log shows the following error log. 
> 
> pacemaker-controld[3582]:  notice: hanapopdb1-
> rsc_SAPHana_HPN_HDB00_monitor_60000:195 [ Error performing operation:
> No such device or address]
> I am unable to identify the cause of the error message and resolve it
> 
> And due to the above, the data replication between the 2 nodes is
> recorded as failed (SFAIL) . Pls see the excerpt from the CIB below:
> 
>  <node_state id="2" in_ccm="true" crmd="online" crm-debug-
> origin="do_update_resource" uname="zhanapopdb2" join="member"
> expected="member">
>       <transient_attributes id="2">
>         <instance_attributes id="status-2">
>           <nvpair id="status-2-hana_hpn_clone_state"
> name="hana_hpn_clone_state" value="WAITING4PRIM"/>
>           <nvpair id="status-2-hana_hpn_version"
> name="hana_hpn_version" value="2.00.056.00.1624618329"/>
>           <nvpair id="status-2-master-rsc_SAPHana_HPN_HDB00"
> name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/>
>           <nvpair id="status-2-hana_hpn_sync_state"
> name="hana_hpn_sync_state" value="SFAIL"/>
>           <nvpair id="status-2-hana_hpn_roles" name="hana_hpn_roles"
> value="4:S:master1:master:worker:master"/>
>         </instance_attributes>
>       </transient_attributes>
> 
> Pacemaker is able to failover the resources from the primary to the
> secondary but they all fail back to the primary, the moment I clean
> up the failure in the primary node.

I'm not familiar enough with SAP to speak to that side of things, but
the behavior after clean-up is normal. If you don't want resources to
go back to their preferred node after a failure is cleaned up, set the
resource-stickiness meta-attribute to a positive number (either on the
resource itself, or in resource defaults if you want it to apply to
everything).

> I deleted and recreated the entire configuration and reconfigured the
> hana data replication but it hasn't helped. 
> 
> 
> Cluster configuration:
> hanapopdb1:~ # crm configure show
> node 1: hanapopdb1 \
>         attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO
> hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync
> lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2
> node 2: hanapopdb2 \
>         attributes lpa_hpn_lpt=10
> hana_hpn_op_mode=logreplay_readaccess hana_hpn_vhost=hanapopdb2
> hana_hpn_remoteHost=hanapopdb1 hana_hpn_site=SITE2PO
> hana_hpn_srmode=sync
> primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \
>         operations $id=rsc_sap2_HPN_HDB00-operations \
>         op monitor interval=10 timeout=600 \
>         op start interval=0 timeout=600 \
>         op stop interval=0 timeout=300 \
>         params SID=HPN InstanceNumber=00
> primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \
>         operations $id=rsc_sap_HPN_HDB00-operations \
>         op start interval=0 timeout=3600 \
>         op stop interval=0 timeout=3600 \
>         op promote interval=0 timeout=3600 \
>         op monitor interval=60 role=Master timeout=700 \
>         op monitor interval=61 role=Slave timeout=700 \
>         params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true
> DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
> primitive rsc_ip_HPN_HDB00 IPaddr2 \
>         meta target-role=Started \
>         operations $id=rsc_ip_HPN_HDB00-operations \
>         op monitor interval=10s timeout=20s \
>         params ip=10.10.1.60
> primitive rsc_nc_HPN_HDB00 azure-lb \
>         params port=62506
> primitive stonith-sbd stonith:external/sbd \
>         params pcmk_delay_max=30 \
>         op monitor interval=30 timeout=30
> group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00
> ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \
>         meta is-managed=true notify=true clone-max=2 clone-node-max=1 
> target-role=Started interleave=true
> clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \
>         meta clone-node-max=1 target-role=Started interleave=true
> colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started
> msl_SAPHana_HPN_HDB00:Master
> order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00
> msl_SAPHana_HPN_HDB00
> property cib-bootstrap-options: \
>         last-lrm-refresh=1649387935 \
>         maintenance-mode=true
> 
> Regards,
> 
> Aj
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgaillot at redhat.com>