[ClusterLabs] Fence node when network interface goes down
S Rogers
sa.rogers1342 at gmail.com
Sun Nov 14 16:59:37 EST 2021
The mentioned error occurs when attempting to promote the PostgreSQL
resource on the standby node, after the master PostgreSQL resource is
stopped.
For info, here is my configuration:
Corosync Nodes:
node1.local node2.local
Pacemaker Nodes:
node1.local node2.local
Resources:
Clone: public_network_monitor-clone
Resource: public_network_monitor (class=ocf provider=heartbeat
type=ethmonitor)
Attributes: interface=eth0 link_status_only=true name=ethmonitor-public
Operations: monitor interval=10s timeout=60s
(public_network_monitor-monitor-interval-10s)
start interval=0s timeout=60s
(public_network_monitor-start-interval-0s)
stop interval=0s timeout=20s
(public_network_monitor-stop-interval-0s)
Clone: pgsqld-clone
Meta Attrs: notify=true promotable=true
Resource: pgsqld (class=ocf provider=heartbeat type=pgsqlms)
Attributes: bindir=/usr/lib/postgresql/12/bin
datadir=/var/lib/postgresql/12/main pgdata=/etc/postgresql/12/main
Operations: demote interval=0s timeout=120s (pgsqld-demote-interval-0s)
methods interval=0s timeout=5 (pgsqld-methods-interval-0s)
monitor interval=15s role=Master timeout=10s
(pgsqld-monitor-interval-15s)
monitor interval=16s role=Slave timeout=10s
(pgsqld-monitor-interval-16s)
notify interval=0s timeout=60s (pgsqld-notify-interval-0s)
promote interval=0s timeout=30s (pgsqld-promote-interval-0s)
reload interval=0s timeout=20 (pgsqld-reload-interval-0s)
start interval=0s timeout=60s (pgsqld-start-interval-0s)
stop interval=0s timeout=60s (pgsqld-stop-interval-0s)
Resource: public_virtual_ip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: cidr_netmask=24 ip=192.168.50.3 nic=mgnet0
Operations: monitor interval=30s (public_virtual_ip-monitor-interval-30s)
start interval=0s timeout=20s
(public_virtual_ip-start-interval-0s)
stop interval=0s timeout=20s
(public_virtual_ip-stop-interval-0s)
Stonith Devices:
Resource: node1_fence_agent (class=stonith type=fence_ssh)
Attributes: hostname=192.168.60.1 pcmk_delay_base=15
pcmk_host_list=node1.local user=root
Operations: monitor interval=60s (node1_fence_agent-monitor-interval-60s)
Resource: node2_fence_agent (class=stonith type=fence_ssh)
Attributes: hostname=192.168.60.2 pcmk_host_list=node2.local user=root
Operations: monitor interval=60s (node2_fence_agent-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Resource: node1_fence_agent
Disabled on: node1.local (score:-INFINITY)
(id:location-node1_fence_agent-node1.local--INFINITY)
Resource: node2_fence_agent
Disabled on: node2.local (score:-INFINITY)
(id:location-node2_fence_agent-node2.local--INFINITY)
Resource: public_virtual_ip
Constraint: location-public_virtual_ip
Rule: score=INFINITY (id:location-public_virtual_ip-rule)
Expression: ethmonitor-public eq 1
(id:location-public_virtual_ip-rule-expr)
Ordering Constraints:
promote pgsqld-clone then start public_virtual_ip (kind:Mandatory)
(non-symmetrical) (id:order-pgsqld-clone-public_virtual_ip-Mandatory)
demote pgsqld-clone then stop public_virtual_ip (kind:Mandatory)
(non-symmetrical) (id:order-pgsqld-clone-public_virtual_ip-Mandatory-1)
Colocation Constraints:
public_virtual_ip with pgsqld-clone (score:INFINITY)
(rsc-role:Started) (with-rsc-role:Master)
(id:colocation-public_virtual_ip-pgsqld-clone-INFINITY)
Ticket Constraints:
This is my understanding of the sequence of events:
1. Node1 is running the PostgreSQL resource as master, Node2 is running
the PostgreSQL resource as standby. Everything is working okay at this
point.
2. On Node1, the public network goes down and ethmonitor changes the
ethmonitor-public node attribute from 1 to 0.
3. The location-public_virtual_ip constraint (which requires the IP to
run on a node with ethmonitor-public==1) kicks in, and pacemaker demotes
the master PostgreSQL resource so that it can then promote it on Node2.
4. The primary PostgreSQL instance on Node2 attempts to shutdown in
response to the demotion, but it can't connect to the standby so is
unable to stop cleanly. The PostgreSQL resource shows as demoting for 60
seconds, as below:
Clone Set: pgsqld-clone [pgsqld] (promotable)
pgsqld (ocf::heartbeat:pgsqlms): Demoting node1.local
Slaves: [ node2.local ]
5. After a minute, the demotion finishes and pacemaker attempts to
promote the PostgreSQL resource on Node2. This action fails with the
"Switchover has been canceled from pre-promote action" error, because
the standby didn't receive the final WAL activity from the primary.
6. Due to the failed promotion on Node2, PAF/Pacemaker promotes the
PostgreSQL resource on Node1 again. However, due to the public network
interface being down, the PostgreSQL and virtual IP resources provided
by the HA cluster are now completely inaccessible, even though Node2 is
perfectly capable of hosting the resources.
I believe the 60 second wait during demotion is due to the default value
of '60s' for wal_sender_timeout
(https://www.postgresql.org/docs/12/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SENDER).
After 60 seconds of trying to reach the standby node, PostgreSQL
terminates the replication connection, at which point the shutdown and
demotion complete. If I set wal_sender_timeout to a value higher than
the pgsqld resource demote timeout (eg, demote timeout=120s,
wal_sender_timeout=150s), then the demote action times out and the node
is fenced, at which point the PostgreSQL resource is promoted
successfully on the standby node. This is almost what I want, but it
means it can take over 2 minutes just for the failover to initiate (then
there is additional time to start the resources on the standby node,
etc) - which is not an acceptable timeframe for us, given that
ethmonitor detects that there is a problem within 10 seconds. I could
reduce the pgsqld demote timeout in order to achieve a quicker failed
demotion, but that would go against the officially suggested values by
the PAF team, so I don't really want to do that.
Pacemaker logs can be found here:
Node1: https://pastebin.com/iT6GgWTe
Node2: https://pastebin.com/Yj8Xjxe7
More information about the Users
mailing list