[ClusterLabs] SAP HANA monitor fails - Error performing operation: No such device or address
Ken Gaillot
kgaillot at redhat.com
Fri Apr 8 09:38:27 EDT 2022
On Fri, 2022-04-08 at 17:17 +0800, Aj Revelino wrote:
> Hello All,
> I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker
> monitors the data replication between the primary and the secondary
> node. The issue is that crm status shows that everything is okay but
> the system log shows the following error log.
>
> pacemaker-controld[3582]: notice: hanapopdb1-
> rsc_SAPHana_HPN_HDB00_monitor_60000:195 [ Error performing operation:
> No such device or address]
> I am unable to identify the cause of the error message and resolve it
>
> And due to the above, the data replication between the 2 nodes is
> recorded as failed (SFAIL) . Pls see the excerpt from the CIB below:
>
> <node_state id="2" in_ccm="true" crmd="online" crm-debug-
> origin="do_update_resource" uname="zhanapopdb2" join="member"
> expected="member">
> <transient_attributes id="2">
> <instance_attributes id="status-2">
> <nvpair id="status-2-hana_hpn_clone_state"
> name="hana_hpn_clone_state" value="WAITING4PRIM"/>
> <nvpair id="status-2-hana_hpn_version"
> name="hana_hpn_version" value="2.00.056.00.1624618329"/>
> <nvpair id="status-2-master-rsc_SAPHana_HPN_HDB00"
> name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/>
> <nvpair id="status-2-hana_hpn_sync_state"
> name="hana_hpn_sync_state" value="SFAIL"/>
> <nvpair id="status-2-hana_hpn_roles" name="hana_hpn_roles"
> value="4:S:master1:master:worker:master"/>
> </instance_attributes>
> </transient_attributes>
>
> Pacemaker is able to failover the resources from the primary to the
> secondary but they all fail back to the primary, the moment I clean
> up the failure in the primary node.
I'm not familiar enough with SAP to speak to that side of things, but
the behavior after clean-up is normal. If you don't want resources to
go back to their preferred node after a failure is cleaned up, set the
resource-stickiness meta-attribute to a positive number (either on the
resource itself, or in resource defaults if you want it to apply to
everything).
> I deleted and recreated the entire configuration and reconfigured the
> hana data replication but it hasn't helped.
>
>
> Cluster configuration:
> hanapopdb1:~ # crm configure show
> node 1: hanapopdb1 \
> attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO
> hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync
> lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2
> node 2: hanapopdb2 \
> attributes lpa_hpn_lpt=10
> hana_hpn_op_mode=logreplay_readaccess hana_hpn_vhost=hanapopdb2
> hana_hpn_remoteHost=hanapopdb1 hana_hpn_site=SITE2PO
> hana_hpn_srmode=sync
> primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \
> operations $id=rsc_sap2_HPN_HDB00-operations \
> op monitor interval=10 timeout=600 \
> op start interval=0 timeout=600 \
> op stop interval=0 timeout=300 \
> params SID=HPN InstanceNumber=00
> primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \
> operations $id=rsc_sap_HPN_HDB00-operations \
> op start interval=0 timeout=3600 \
> op stop interval=0 timeout=3600 \
> op promote interval=0 timeout=3600 \
> op monitor interval=60 role=Master timeout=700 \
> op monitor interval=61 role=Slave timeout=700 \
> params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true
> DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
> primitive rsc_ip_HPN_HDB00 IPaddr2 \
> meta target-role=Started \
> operations $id=rsc_ip_HPN_HDB00-operations \
> op monitor interval=10s timeout=20s \
> params ip=10.10.1.60
> primitive rsc_nc_HPN_HDB00 azure-lb \
> params port=62506
> primitive stonith-sbd stonith:external/sbd \
> params pcmk_delay_max=30 \
> op monitor interval=30 timeout=30
> group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00
> ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \
> meta is-managed=true notify=true clone-max=2 clone-node-max=1
> target-role=Started interleave=true
> clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \
> meta clone-node-max=1 target-role=Started interleave=true
> colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started
> msl_SAPHana_HPN_HDB00:Master
> order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00
> msl_SAPHana_HPN_HDB00
> property cib-bootstrap-options: \
> last-lrm-refresh=1649387935 \
> maintenance-mode=true
>
> Regards,
>
> Aj
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list