[ClusterLabs] Antw: Re: Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

Mon Apr 11 02:14:58 EDT 2022

>>> Aj Revelino <aj.revelino at gmail.com> schrieb am 08.04.2022 um 23:27 in Nachricht
<CAJY7vkC27H7Lf_FAy1DUF2dkLnvTMTzYW8B61vK+6_ijvXSKag at mail.gmail.com>:
> Hi Ulrich,
> I set the cluster in maintenance mode due to the consistent logging of the
> error messages in the system log.
> 
> Pacemaker has attempted to execute the monitor operation of the resource
> agent here. Is there a way to find out why pacemaker says 'No such device
> or address'?
>  hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_60000:195 [ Error performing
> operation: No such device or address]*

If inspecting the RA or turning on debugging for the RA does not help, you could try to add a line like "exec 2>&1 >log_file; set -x" to the beginning of the RA.
I know some of those SAP RAs are hard to understand.

Regards,
Ulrich

> 
> Regards,
> Aj
> 
> On Fri, Apr 8, 2022 at 8:23 PM Ulrich Windl <
> Ulrich.Windl at rz.uni-regensburg.de> wrote:
> 
>> "maintenance-mode=true"? Why?
>>
>>
>> >>> Aj Revelino <aj.revelino at gmail.com> schrieb am 08.04.2022 um 11:17 in
>> Nachricht
>> <CAJY7vkA=SfaJngsfJnREkFMnMJ0hn=ppkec7CyUci32CR3Ro+g at mail.gmail.com>:
>> > Hello All,
>> > I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker
>> > monitors the data replication between the primary and the secondary node.
>> > The issue is that crm status shows that everything is okay but the system
>> > log shows the following error log.
>> >
>> >
>> > *pacemaker-controld[3582]:  notice:
>> > hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_60000:195 [ Error performing
>> > operation: No such device or address]*
>> > I am unable to identify the cause of the error message and resolve it
>> >
>> > And due to the above, the data replication between the 2 nodes is
>> recorded
>> > as failed (SFAIL) . Pls see the excerpt from the CIB below:
>> >
>> >  <node_state id="2" in_ccm="true" crmd="online"
>> > crm-debug-origin="do_update_resource" uname="zhanapopdb2" join="member"
>> > expected="member">
>> >       <transient_attributes id="2">
>> >         <instance_attributes id="status-2">
>> >          * <nvpair id="status-2-hana_hpn_clone_state"
>> > name="hana_hpn_clone_state" value="WAITING4PRIM"/>*
>> >           <nvpair id="status-2-hana_hpn_version" name="hana_hpn_version"
>> > value="2.00.056.00.1624618329"/>
>> >           <nvpair id="status-2-master-rsc_SAPHana_HPN_HDB00"
>> > name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/>
>> >           *<nvpair id="status-2-hana_hpn_sync_state"
>> > name="hana_hpn_sync_state" value="SFAIL"/>*
>> >           <nvpair id="status-2-hana_hpn_roles" name="hana_hpn_roles"
>> > value="4:S:master1:master:worker:master"/>
>> >         </instance_attributes>
>> >       </transient_attributes>
>> >
>> > Pacemaker is able to failover the resources from the primary to the
>> > secondary but they all fail back to the primary, the moment I clean up
>> the
>> > failure in the primary node.
>> > I deleted and recreated the entire configuration and reconfigured the
>> hana
>> > data replication but it hasn't helped.
>> >
>> >
>> > *Cluster configuration:*
>> > hanapopdb1:~ # crm configure show
>> > node 1: hanapopdb1 \
>> >         attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO
>> > hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync
>> > lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2
>> > node 2: hanapopdb2 \
>> >         attributes lpa_hpn_lpt=10 hana_hpn_op_mode=logreplay_readaccess
>> > hana_hpn_vhost=hanapopdb2 hana_hpn_remoteHost=hanapopdb1
>> > hana_hpn_site=SITE2PO hana_hpn_srmode=sync
>> > primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \
>> >         operations $id=rsc_sap2_HPN_HDB00-operations \
>> >         op monitor interval=10 timeout=600 \
>> >         op start interval=0 timeout=600 \
>> >         op stop interval=0 timeout=300 \
>> >         params SID=HPN InstanceNumber=00
>> > primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \
>> >         operations $id=rsc_sap_HPN_HDB00-operations \
>> >         op start interval=0 timeout=3600 \
>> >         op stop interval=0 timeout=3600 \
>> >         op promote interval=0 timeout=3600 \
>> >         op monitor interval=60 role=Master timeout=700 \
>> >         op monitor interval=61 role=Slave timeout=700 \
>> >         params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true
>> > DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
>> > primitive rsc_ip_HPN_HDB00 IPaddr2 \
>> >         meta target-role=Started \
>> >         operations $id=rsc_ip_HPN_HDB00-operations \
>> >         op monitor interval=10s timeout=20s \
>> >         params ip=10.10.1.60
>> > primitive rsc_nc_HPN_HDB00 azure-lb \
>> >         params port=62506
>> > primitive stonith-sbd stonith:external/sbd \
>> >         params pcmk_delay_max=30 \
>> >         op monitor interval=30 timeout=30
>> > group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00
>> > ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \
>> >         meta is-managed=true notify=true clone-max=2 clone-node-max=1
>> > target-role=Started interleave=true
>> > clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \
>> >         meta clone-node-max=1 target-role=Started interleave=true
>> > colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started
>> > msl_SAPHana_HPN_HDB00:Master
>> > order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00
>> > msl_SAPHana_HPN_HDB00
>> > property cib-bootstrap-options: \
>> >         last-lrm-refresh=1649387935 \
>> >         maintenance-mode=true
>> >
>> > Regards,
>> >
>> > Aj
>>
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> ClusterLabs home: https://www.clusterlabs.org/ 
>>