[ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

Fri Apr 8 17:27:18 EDT 2022

Hi Ulrich,
I set the cluster in maintenance mode due to the consistent logging of the
error messages in the system log.

Pacemaker has attempted to execute the monitor operation of the resource
agent here. Is there a way to find out why pacemaker says 'No such device
or address'?
 hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_60000:195 [ Error performing
operation: No such device or address]*

Regards,
Aj

On Fri, Apr 8, 2022 at 8:23 PM Ulrich Windl <
Ulrich.Windl at rz.uni-regensburg.de> wrote:

> "maintenance-mode=true"? Why?
>
>
> >>> Aj Revelino <aj.revelino at gmail.com> schrieb am 08.04.2022 um 11:17 in
> Nachricht
> <CAJY7vkA=SfaJngsfJnREkFMnMJ0hn=ppkec7CyUci32CR3Ro+g at mail.gmail.com>:
> > Hello All,
> > I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker
> > monitors the data replication between the primary and the secondary node.
> > The issue is that crm status shows that everything is okay but the system
> > log shows the following error log.
> >
> >
> > *pacemaker-controld[3582]:  notice:
> > hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_60000:195 [ Error performing
> > operation: No such device or address]*
> > I am unable to identify the cause of the error message and resolve it
> >
> > And due to the above, the data replication between the 2 nodes is
> recorded
> > as failed (SFAIL) . Pls see the excerpt from the CIB below:
> >
> >  <node_state id="2" in_ccm="true" crmd="online"
> > crm-debug-origin="do_update_resource" uname="zhanapopdb2" join="member"
> > expected="member">
> >       <transient_attributes id="2">
> >         <instance_attributes id="status-2">
> >          * <nvpair id="status-2-hana_hpn_clone_state"
> > name="hana_hpn_clone_state" value="WAITING4PRIM"/>*
> >           <nvpair id="status-2-hana_hpn_version" name="hana_hpn_version"
> > value="2.00.056.00.1624618329"/>
> >           <nvpair id="status-2-master-rsc_SAPHana_HPN_HDB00"
> > name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/>
> >           *<nvpair id="status-2-hana_hpn_sync_state"
> > name="hana_hpn_sync_state" value="SFAIL"/>*
> >           <nvpair id="status-2-hana_hpn_roles" name="hana_hpn_roles"
> > value="4:S:master1:master:worker:master"/>
> >         </instance_attributes>
> >       </transient_attributes>
> >
> > Pacemaker is able to failover the resources from the primary to the
> > secondary but they all fail back to the primary, the moment I clean up
> the
> > failure in the primary node.
> > I deleted and recreated the entire configuration and reconfigured the
> hana
> > data replication but it hasn't helped.
> >
> >
> > *Cluster configuration:*
> > hanapopdb1:~ # crm configure show
> > node 1: hanapopdb1 \
> >         attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO
> > hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync
> > lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2
> > node 2: hanapopdb2 \
> >         attributes lpa_hpn_lpt=10 hana_hpn_op_mode=logreplay_readaccess
> > hana_hpn_vhost=hanapopdb2 hana_hpn_remoteHost=hanapopdb1
> > hana_hpn_site=SITE2PO hana_hpn_srmode=sync
> > primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \
> >         operations $id=rsc_sap2_HPN_HDB00-operations \
> >         op monitor interval=10 timeout=600 \
> >         op start interval=0 timeout=600 \
> >         op stop interval=0 timeout=300 \
> >         params SID=HPN InstanceNumber=00
> > primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \
> >         operations $id=rsc_sap_HPN_HDB00-operations \
> >         op start interval=0 timeout=3600 \
> >         op stop interval=0 timeout=3600 \
> >         op promote interval=0 timeout=3600 \
> >         op monitor interval=60 role=Master timeout=700 \
> >         op monitor interval=61 role=Slave timeout=700 \
> >         params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true
> > DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
> > primitive rsc_ip_HPN_HDB00 IPaddr2 \
> >         meta target-role=Started \
> >         operations $id=rsc_ip_HPN_HDB00-operations \
> >         op monitor interval=10s timeout=20s \
> >         params ip=10.10.1.60
> > primitive rsc_nc_HPN_HDB00 azure-lb \
> >         params port=62506
> > primitive stonith-sbd stonith:external/sbd \
> >         params pcmk_delay_max=30 \
> >         op monitor interval=30 timeout=30
> > group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00
> > ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \
> >         meta is-managed=true notify=true clone-max=2 clone-node-max=1
> > target-role=Started interleave=true
> > clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \
> >         meta clone-node-max=1 target-role=Started interleave=true
> > colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started
> > msl_SAPHana_HPN_HDB00:Master
> > order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00
> > msl_SAPHana_HPN_HDB00
> > property cib-bootstrap-options: \
> >         last-lrm-refresh=1649387935 \
> >         maintenance-mode=true
> >
> > Regards,
> >
> > Aj
>
>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20220409/ec90b24e/attachment-0001.htm>