[ClusterLabs] SAP HANA monitor fails - Error performing operation: No such device or address

Fri Apr 8 05:17:12 EDT 2022

Hello All,
I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker
monitors the data replication between the primary and the secondary node.
The issue is that crm status shows that everything is okay but the system
log shows the following error log.

*pacemaker-controld[3582]:  notice:
hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_60000:195 [ Error performing
operation: No such device or address]*
I am unable to identify the cause of the error message and resolve it

And due to the above, the data replication between the 2 nodes is recorded
as failed (SFAIL) . Pls see the excerpt from the CIB below:

 <node_state id="2" in_ccm="true" crmd="online"
crm-debug-origin="do_update_resource" uname="zhanapopdb2" join="member"
expected="member">
      <transient_attributes id="2">
        <instance_attributes id="status-2">
         * <nvpair id="status-2-hana_hpn_clone_state"
name="hana_hpn_clone_state" value="WAITING4PRIM"/>*
          <nvpair id="status-2-hana_hpn_version" name="hana_hpn_version"
value="2.00.056.00.1624618329"/>
          <nvpair id="status-2-master-rsc_SAPHana_HPN_HDB00"
name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/>
          *<nvpair id="status-2-hana_hpn_sync_state"
name="hana_hpn_sync_state" value="SFAIL"/>*
          <nvpair id="status-2-hana_hpn_roles" name="hana_hpn_roles"
value="4:S:master1:master:worker:master"/>
        </instance_attributes>
      </transient_attributes>

Pacemaker is able to failover the resources from the primary to the
secondary but they all fail back to the primary, the moment I clean up the
failure in the primary node.
I deleted and recreated the entire configuration and reconfigured the hana
data replication but it hasn't helped.

*Cluster configuration:*
hanapopdb1:~ # crm configure show
node 1: hanapopdb1 \
        attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO
hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync
lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2
node 2: hanapopdb2 \
        attributes lpa_hpn_lpt=10 hana_hpn_op_mode=logreplay_readaccess
hana_hpn_vhost=hanapopdb2 hana_hpn_remoteHost=hanapopdb1
hana_hpn_site=SITE2PO hana_hpn_srmode=sync
primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \
        operations $id=rsc_sap2_HPN_HDB00-operations \
        op monitor interval=10 timeout=600 \
        op start interval=0 timeout=600 \
        op stop interval=0 timeout=300 \
        params SID=HPN InstanceNumber=00
primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \
        operations $id=rsc_sap_HPN_HDB00-operations \
        op start interval=0 timeout=3600 \
        op stop interval=0 timeout=3600 \
        op promote interval=0 timeout=3600 \
        op monitor interval=60 role=Master timeout=700 \
        op monitor interval=61 role=Slave timeout=700 \
        params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true
DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
primitive rsc_ip_HPN_HDB00 IPaddr2 \
        meta target-role=Started \
        operations $id=rsc_ip_HPN_HDB00-operations \
        op monitor interval=10s timeout=20s \
        params ip=10.10.1.60
primitive rsc_nc_HPN_HDB00 azure-lb \
        params port=62506
primitive stonith-sbd stonith:external/sbd \
        params pcmk_delay_max=30 \
        op monitor interval=30 timeout=30
group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00
ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \
        meta is-managed=true notify=true clone-max=2 clone-node-max=1
target-role=Started interleave=true
clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \
        meta clone-node-max=1 target-role=Started interleave=true
colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started
msl_SAPHana_HPN_HDB00:Master
order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00
msl_SAPHana_HPN_HDB00
property cib-bootstrap-options: \
        last-lrm-refresh=1649387935 \
        maintenance-mode=true

Regards,

Aj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220408/52101a27/attachment.htm>