[ClusterLabs] Peer (slave) node deleting master's transient_attributes

Mon Feb 1 11:16:01 EST 2021

Andrei,
You are right, thank you. I have an earlier thread on which I posted a
pacemaker.log for this issue, and didn't think to point to it here.
The link is http://project.ibss.net/samples/deidPacemakerLog.2021-01-25.txt
 .
So, node01 is in maintenance mode, and constraints prevent any resources
from running on it (other than drbd in Secondary). I would not want node01
to ston[node02]ith after a communications failure, especially not if all
resources are running fine on node02.
Also I did not think to wonder if node01 could become DC even though in
maintenance mode.
The logs seem to me to match this contention. The cib ops happen right in
the middle of the DC negotiations.
Is there a way to tell node01 that it cannot be DC? Like a constraint?
Thanks again.

On Sun, Jan 31, 2021 at 1:55 AM Andrei Borzenkov <arvidjaar at gmail.com>
wrote:

> 29.01.2021 20:37, Stuart Massey пишет:
> > Can someone help me with this?
> > Background:
> >
> > "node01" is failing, and has been placed in "maintenance" mode. It
> > occasionally loses connectivity.
> >
> > "node02" is able to run our resources
> >
> > Consider the following messages from pacemaker.log on "node02", just
> after
> > "node01" has rejoined the cluster (per "node02"):
> >
> > Jan 28 14:48:03 [21933] node02.example.com        cib:     info:
> > cib_perform_op:       --
> > /cib/status/node_state[@id='2']/transient_attributes[@id='2']
> > Jan 28 14:48:03 [21933] node02.example.com        cib:     info:
> > cib_perform_op:       +  /cib:  @num_updates=309
> > Jan 28 14:48:03 [21933] node02.example.com        cib:     info:
> > cib_process_request:  Completed cib_delete operation for section
> > //node_state[@uname='node02.example.com']/transient_attributes: OK
> (rc=0,
> > origin=node01.example.com/crmd/3784, version=0.94.309)
> > Jan 28 14:48:04 [21938] node02.example.com       crmd:     info:
> > abort_transition_graph:       Transition aborted by deletion of
> > transient_attributes[@id='2']: Transient attribute change | cib=0.94.309
> > source=abort_unless_down:357
> > path=/cib/status/node_state[@id='2']/transient_attributes[@id='2']
> > complete=true
> > Jan 28 14:48:05 [21937] node02.example.com    pengine:     info:
> > master_color: ms_drbd_ourApp: Promoted 0 instances of a possible 1 to
> master
> >
> > The implication, it seems to me, is that "node01" has asked "node02" to
> > delete the transient-attributes for "node02". The transient-attributes
> > should normally be:
> >       <transient_attributes id="2">
> >         <instance_attributes id="status-2">
> >           <nvpair id="status-2-master-drbd_ourApp"
> > name="master-drbd_ourApp" value="10000"/>
> >           <nvpair id="status-2-pingd" name="pingd" value="100"/>
> >         </instance_attributes>
> >       </transient_attributes>
> >
> > These attributes are necessary for "node02" to be Master/Primary,
> correct?
> >
> > Why might this be happening and how do we prevent it?
> >
>
> You do not provide enough information to answer. At the very least you
> need to show full logs from both nodes around time it happens (starting
> with both nodes losing connectivity).
>
> But as a wild guess - you do not use stonith, node01 becomes DC and
> clears other node state.
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20210201/e34eb863/attachment-0001.htm>