[ClusterLabs] DRBD ms resource keeps getting demoted

Reid Wahl nwahl at redhat.com
Tue Jan 19 02:27:07 EST 2021

Can you share the cluster configuration (e.g., `pcs config` or the CIB)?
And are there any additional LogAction messages after that one (e.g.,
Promote for node01)?

On Mon, Jan 18, 2021 at 7:47 PM Stuart Massey <djangoschef at gmail.com> wrote:

> So, we have a 2-node cluster with a quorum device. One of the nodes
> (node1) is having some trouble, so we have added constraints to prevent any
> resources migrating to it, but have not put it in standby, so that drbd in
> secondary on that node stays in sync. The problems it is having lead to OS
> lockups that eventually resolve themselves - but that causes it to be
> temporarily dropped from the cluster by the current master (node2).
> Sometimes when node1 rejoins, then node2 will demote the drbd ms resource.
> That causes all resources that depend on it to be stopped, leading to a
> service outage. They are then restarted on node2, since they can't run on
> node1 (due to constraints).
> We are having a hard time understanding why this happens. It seems like
> there may be some sort of DC contention happening. Does anyone have any
> idea how we might prevent this from happening?
> Selected messages (de-identified) from pacemaker.log that illustrate
> suspicion re DC confusion are below. The update_dc and
> abort_transition_graph re deletion of lrm seem to always precede the
> demotion, and a demotion seems to always follow (when not already demoted).
> Jan 18 16:52:17 [21938] node02.example.com       crmd:     info:
> do_dc_takeover:        Taking over DC status for this partition
> Jan 18 16:52:17 [21938] node02.example.com       crmd:     info:
> update_dc:     Set DC to node02.example.com (3.0.14)
> Jan 18 16:52:17 [21938] node02.example.com       crmd:     info:
> abort_transition_graph:        Transition aborted by deletion of
> lrm[@id='1']: Resource state removal | cib=0.89.327
> source=abort_unless_down:357
> path=/cib/status/node_state[@id='1']/lrm[@id='1'] complete=true
> Jan 18 16:52:19 [21937] node02.example.com    pengine:     info:
> master_color:  ms_drbd_ourApp: Promoted 0 instances of a possible 1 to
> master
> Jan 18 16:52:19 [21937] node02.example.com    pengine:   notice:
> LogAction:      * Demote     drbd_ourApp:1     (            Master -> Slave
> node02.example.com )
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> ClusterLabs home: https://www.clusterlabs.org/


Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210118/3ceb59b3/attachment.htm>

More information about the Users mailing list