<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 2021-07-26 12:50 p.m.,
<a class="moz-txt-link-abbreviated" href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a> wrote:<br>
</div>
<blockquote type="cite"
cite="mid:1a18b3a5c6730f97ddc6c91fb946f509831fa56d.camel@redhat.com">
<pre class="moz-quote-pre" wrap="">On Mon, 2021-07-26 at 12:25 -0400, Digimer wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On 2021-07-26 9:54 a.m., <a class="moz-txt-link-abbreviated" href="mailto:kgaillot@redhat.com">kgaillot@redhat.com</a> wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On Fri, 2021-07-23 at 21:46 -0400, Digimer wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">After a LOT of hassle, I finally got it updated, but OMG it was
painful.
I degraded the cluster (unsure if needed), set maintenance mode,
deleted
the stonith levels, deleted the stonith devices, recreated them
with
the
updated values, recreated the stonith levels, and finally
disabled
maintenance mode.
It should not have been this hard, right? Why is heck would it be
that
pacemaker kept "rolling back" to old configs? I'd delete the
stonith
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
That is bizarre. It sounds like the CIB changes were taking effect
locally, then being rejected by the rest of the cluster, which
would
send the "correct" CIB back to the originator.
The logs of interest would be pacemaker.log from both nodes at the
time
you made the first configuration change that failed. I'm guessing
the
logs you posted were from after that point?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Below are the logs. The change appears to first try at 'Jul 23
16:22:27', made on an-a02n01, included logs for a few minutes before
in case relevant.
* an-a02n01:
<a class="moz-txt-link-freetext" href="https://www.alteeve.com/an-repo/files/an-a02n01.pacemaker.log">https://www.alteeve.com/an-repo/files/an-a02n01.pacemaker.log</a>
* an-a02n02:
<a class="moz-txt-link-freetext" href="https://www.alteeve.com/an-repo/files/an-a02n02.pacemaker.log">https://www.alteeve.com/an-repo/files/an-a02n02.pacemaker.log</a>
Note that the PDUs as originally configured (10.201.2.1/2) were not
available, so I had to disable and cleanup the stonith resources.
They seemed to keep getting re-enabled, so I got to the habit of
doing this cycle of disable -> cleanup -> disable -> cleanup before I
could reliably get the resources to be 'stopped (disabled)' in 'pcs
stonith status'.
digimer
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
The initial change happened here:
Jul 23 16:22:27 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: Diff: --- 0.337.112 2
Jul 23 16:22:27 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: Diff: +++ 0.338.0 6a24af66df3d9f825cc2681222f8f5d6
Jul 23 16:22:27 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: + /cib: @epoch=338, @num_updates=0
Jul 23 16:22:27 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: + /cib/configuration/resources/primitive[@id='apc_snmp_node1_an-pdu03']/instance_attributes[@id='apc_snmp_node1_an-pdu03-instance_attributes']/nvpair[@id='apc_snmp_node1_an-pdu03-instance_attributes-ip']: @value=10.201.2.3
Jul 23 16:22:27 an-a02n01.alteeve.com pacemaker-based [121628] (cib_replace_notify) info: Replaced: 0.337.112 -> 0.338.0 from an-a02n02
Jul 23 16:22:27 an-a02n01.alteeve.com pacemaker-based [121628] (cib_process_request) info: Completed cib_replace operation for section configuration: OK (rc=0, origin=an-a02n02/cibadmin/2, version=0.338.0)
origin=an-a02n02/cibadmin/2 means that someone or something ran the
cibadmin tool on an-02n02. Presumably this was your interactive pcs
command.
It was then reverted by:
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: Diff: --- 0.343.3 2
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: Diff: +++ 0.344.0 (null)
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: + /cib: @epoch=344, @num_updates=0
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: ++ /cib/configuration/resources: <primitive class="stonith" id="apc_snmp_node1_an-pdu03" type="fence_apc_snmp"/>
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: ++ <instance_attributes id="apc_snmp_node1_an-pdu03-instance_attributes">
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: ++ <nvpair id="apc_snmp_node1_an-pdu03-instance_attributes-ip" name="ip" value="10.201.2.1"/>
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: ++ <nvpair id="apc_snmp_node1_an-pdu03-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a02n01"/>
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: ++ <nvpair id="apc_snmp_node1_an-pdu03-instance_attributes-pcmk_off_action" name="pcmk_off_action" value="reboot"/>
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: ++ <nvpair id="apc_snmp_node1_an-pdu03-instance_attributes-port" name="port" value="5"/>
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: ++ </instance_attributes>
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: ++ <operations>
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: ++ <op id="apc_snmp_node1_an-pdu03-monitor-interval-60" interval="60" name="monitor"/>
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: ++ </operations>
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_perform_op) info: ++ </primitive>
Jul 23 16:22:50 an-a02n01.alteeve.com pacemaker-based [121628] (cib_process_request) info: Completed cib_apply_diff operation for section 'all': OK (rc=0, origin=an-a02n02/cibadmin/2, version=0.344.0)
Notice the origin is still cibadmin on an-a02n02. So this was either
you, or a script or cron on that node. I don't see any additional
details on that node.
</pre>
</blockquote>
<p>I have no idea what would have / could have done that. I had
ScanCore disabled, so my software wasn't doing anything. These are
stock CentOS Stream 8 installs, so there wouldn't be anything in
cron that should do this. <br>
</p>
<p>I am very confused... =/<br>
</p>
<pre class="moz-signature" cols="72">--
Digimer
Papers and Projects: <a class="moz-txt-link-freetext" href="https://alteeve.com/w/">https://alteeve.com/w/</a>
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould</pre>
</body>
</html>