[ClusterLabs] DRBD ms resource keeps getting demoted
Stuart Massey
djangoschef at gmail.com
Mon Jan 18 22:46:10 EST 2021
So, we have a 2-node cluster with a quorum device. One of the nodes (node1)
is having some trouble, so we have added constraints to prevent any
resources migrating to it, but have not put it in standby, so that drbd in
secondary on that node stays in sync. The problems it is having lead to OS
lockups that eventually resolve themselves - but that causes it to be
temporarily dropped from the cluster by the current master (node2).
Sometimes when node1 rejoins, then node2 will demote the drbd ms resource.
That causes all resources that depend on it to be stopped, leading to a
service outage. They are then restarted on node2, since they can't run on
node1 (due to constraints).
We are having a hard time understanding why this happens. It seems like
there may be some sort of DC contention happening. Does anyone have any
idea how we might prevent this from happening?
Selected messages (de-identified) from pacemaker.log that illustrate
suspicion re DC confusion are below. The update_dc and
abort_transition_graph re deletion of lrm seem to always precede the
demotion, and a demotion seems to always follow (when not already demoted).
Jan 18 16:52:17 [21938] node02.example.com crmd: info:
do_dc_takeover: Taking over DC status for this partition
Jan 18 16:52:17 [21938] node02.example.com crmd: info: update_dc:
Set DC to node02.example.com (3.0.14)
Jan 18 16:52:17 [21938] node02.example.com crmd: info:
abort_transition_graph: Transition aborted by deletion of
lrm[@id='1']: Resource state removal | cib=0.89.327
source=abort_unless_down:357
path=/cib/status/node_state[@id='1']/lrm[@id='1'] complete=true
Jan 18 16:52:19 [21937] node02.example.com pengine: info:
master_color: ms_drbd_ourApp: Promoted 0 instances of a possible 1 to
master
Jan 18 16:52:19 [21937] node02.example.com pengine: notice: LogAction:
* Demote drbd_ourApp:1 ( Master -> Slave
node02.example.com )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210118/ab8ed84d/attachment-0001.htm>
More information about the Users
mailing list