[ClusterLabs] Peer (slave) node deleting master's transient_attributes
Ken Gaillot
kgaillot at redhat.com
Mon Feb 1 11:27:48 EST 2021
On Mon, 2021-02-01 at 11:16 -0500, Stuart Massey wrote:
> Andrei,
> You are right, thank you. I have an earlier thread on which I posted
> a pacemaker.log for this issue, and didn't think to point to it here.
> The link is
> http://project.ibss.net/samples/deidPacemakerLog.2021-01-25.txtxt .
> So, node01 is in maintenance mode, and constraints prevent any
> resources from running on it (other than drbd in Secondary). I would
> not want node01 to ston[node02]ith after a communications failure,
> especially not if all resources are running fine on node02.
> Also I did not think to wonder if node01 could become DC even though
> in maintenance mode.
> The logs seem to me to match this contention. The cib ops happen
> right in the middle of the DC negotiations.
> Is there a way to tell node01 that it cannot be DC? Like a
> constraint?
No, though that's been suggested as a new feature.
As a workaround, you could restart the cluster on the less preferred
node -- the controller with the most CPU time (i.e. up the longest)
will be preferred for DC (if pacemaker versions are equal).
> Thanks again.
>
>
>
> On Sun, Jan 31, 2021 at 1:55 AM Andrei Borzenkov <arvidjaar at gmail.com
> > wrote:
> > 29.01.2021 20:37, Stuart Massey пишет:
> > > Can someone help me with this?
> > > Background:
> > >
> > > "node01" is failing, and has been placed in "maintenance" mode.
> > It
> > > occasionally loses connectivity.
> > >
> > > "node02" is able to run our resources
> > >
> > > Consider the following messages from pacemaker.log on "node02",
> > just after
> > > "node01" has rejoined the cluster (per "node02"):
> > >
> > > Jan 28 14:48:03 [21933] node02.example.com cib: info:
> > > cib_perform_op: --
> > > /cib/status/node_state[@id='2']/transient_attributes[@id='2']
> > > Jan 28 14:48:03 [21933] node02.example.com cib: info:
> > > cib_perform_op: + /cib: @num_updates=309
> > > Jan 28 14:48:03 [21933] node02.example.com cib: info:
> > > cib_process_request: Completed cib_delete operation for section
> > > //node_state[@uname='node02.example.com']/transient_attributes:
> > OK (rc=0,
> > > origin=node01.example.com/crmd/3784, version=0.94.309)
> > > Jan 28 14:48:04 [21938] node02.example.com crmd: info:
> > > abort_transition_graph: Transition aborted by deletion of
> > > transient_attributes[@id='2']: Transient attribute change |
> > cib=0.94.309
> > > source=abort_unless_down:357
> > >
> > path=/cib/status/node_state[@id='2']/transient_attributes[@id='2']
> > > complete=true
> > > Jan 28 14:48:05 [21937] node02.example.com pengine: info:
> > > master_color: ms_drbd_ourApp: Promoted 0 instances of a possible
> > 1 to master
> > >
> > > The implication, it seems to me, is that "node01" has asked
> > "node02" to
> > > delete the transient-attributes for "node02". The transient-
> > attributes
> > > should normally be:
> > > <transient_attributes id="2">
> > > <instance_attributes id="status-2">
> > > <nvpair id="status-2-master-drbd_ourApp"
> > > name="master-drbd_ourApp" value="10000"/>
> > > <nvpair id="status-2-pingd" name="pingd" value="100"/>
> > > </instance_attributes>
> > > </transient_attributes>
> > >
> > > These attributes are necessary for "node02" to be Master/Primary,
> > correct?
> > >
> > > Why might this be happening and how do we prevent it?
> > >
> >
> > You do not provide enough information to answer. At the very least
> > you
> > need to show full logs from both nodes around time it happens
> > (starting
> > with both nodes losing connectivity).
> >
> > But as a wild guess - you do not use stonith, node01 becomes DC and
> > clears other node state.
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list