[ClusterLabs] Cannot ping a secondary address apart from the server which it is assigned to (on Azure)

Sun Oct 31 18:56:16 EDT 2021

On 28/10/2021 14:30, Andrei Borzenkov wrote:
> For virtual IP you can (should?) use Azure
> load balancers - basically,  you create a pool of one address, Azure
> probes each node and detects which node has IP active.
>
> See as example this RH documentation:
>
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/deploying_red_hat_enterprise_linux_8_on_public_cloud_platforms/configuring-rhel-high-availability-on-azure_cloud-content#azure-create-internal-load-balancer-in-azure-ha_configuring-rhel-high-availability-on-azure
>
I have configured a load balancer as suggested but it is still not 
exposing the floating IP address.

Status looks okay:

[root at haswmfs-vm-lin-000 ~]# pcs status
Cluster name: haswmfs
Cluster Summary:
   * Stack: corosync
   * Current DC: haswmfs-vm-lin-001 (version 2.1.0-8.el8-7c3f660707) - 
partition with quorum
   * Last updated: Sun Oct 31 22:52:00 2021
   * Last change:  Sun Oct 31 20:38:57 2021 by root via cibadmin on 
haswmfs-vm-lin-000
   * 2 nodes configured
   * 5 resource instances configured

Node List:
   * Online: [ haswmfs-vm-lin-000 haswmfs-vm-lin-001 ]

Full List of Resources:
   * Resource Group: haswmfs-service:
     * haswmfs-ip(ocf::heartbeat:IPaddr2): Started haswmfs-vm-lin-000
     * haswmfs-daemon(lsb:smallworld_GIS): Started haswmfs-vm-lin-000
     * haswmfs-fs(ocf::heartbeat:Filesystem): Started haswmfs-vm-lin-000
     * haswmfs-lb(ocf::heartbeat:azure-lb): Started haswmfs-vm-lin-000
   * haswmfs-fence(stonith:fence_azure_arm): Started haswmfs-vm-lin-001

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled
[root at haswmfs-vm-lin-000 ~]#

Resources created using:

     pcs resource create haswmfs-ip ocf:heartbeat:IPaddr2 ip=172.16.31.5 
cidr_netmask=24 nic=eth0 iflabel=haswmfs op monitor interval=30s
     pcs resource create haswmfs-lb ocf:heartbeat:azure-lb port=61000

The IP address 172.16.31.5 is the frontend IP address which is 
dynamically assigned to the load balancer

The load balancer is as follows:

     Internal (aka private)
     Basic SKU
     Dynamic IP address assignment and floating IP enable (using the 
frontend IP address)
     Backend pool to which all the nodes in the cluster are allocated
     Health probe added for port 61000

I have also enable full debug logging in the pacemaker log

Oct 31 20:38:54  IPaddr2(haswmfs-ip)[6291]:    INFO: Adding inet address 
172.16.31.5/24 with broadcast address 172.16.31.255 to device eth0 (with 
label eth0:haswmfs)
Oct 31 20:38:54  IPaddr2(haswmfs-ip)[6291]:    INFO: Bringing device eth0 up
Oct 31 20:38:54  IPaddr2(haswmfs-ip)[6291]:    INFO: 
/usr/libexec/heartbeat/send_arp  -i 200 -r 5 -p 
/run/resource-agents/send_arp-172.16.31.5 eth0 172.16.31.5 auto not_used 
not_used
...
Oct 31 20:38:58  IPaddr2(haswmfs-ip)[6291]:    INFO: ARPING 172.16.31.5 
from 172.16.31.5 eth0

Also multiple repetitions of this:

Oct 31 20:39:08 haswmfs-vm-lin-000 pacemaker-execd     [5974] 
(recurring_action_timer) debug: Scheduling another invocation of 
haswmfs-lb_monitor_10000
Oct 31 20:39:08 haswmfs-vm-lin-000 pacemaker-execd     [5974] 
(operation_finished) debug: haswmfs-lb_monitor_10000[6878] exited with 
status 0
Oct 31 20:39:08 haswmfs-vm-lin-000 pacemaker-execd     [5974] 
(log_finished) debug: haswmfs-lb monitor (call 24, PID 6878) exited with 
status 0 (execution time 0ms, queue time 0ms)
Oct 31 20:39:18 haswmfs-vm-lin-000 pacemaker-execd     [5974] 
(recurring_action_timer) debug: Scheduling another invocation of 
haswmfs-lb_monitor_10000
Oct 31 20:39:18 haswmfs-vm-lin-000 pacemaker-execd     [5974] 
(operation_finished) debug: haswmfs-lb_monitor_10000[7002] exited with 
status 0
Oct 31 20:39:18 haswmfs-vm-lin-000 pacemaker-execd     [5974] 
(log_finished) debug: haswmfs-lb monitor (call 24, PID 7002) exited with 
status 0 (execution time 0ms, queue time 0ms)

Any further advice?

-paul