[ClusterLabs] Antw: [EXT] Re: 'pcs stonith update' takes, then reverts

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Jul 27 02:50:14 EDT 2021


>>> <kgaillot at redhat.com> schrieb am 26.07.2021 um 18:50 in Nachricht
<1a18b3a5c6730f97ddc6c91fb946f509831fa56d.camel at redhat.com>:
> On Mon, 2021‑07‑26 at 12:25 ‑0400, Digimer wrote:
>> On 2021‑07‑26 9:54 a.m., kgaillot at redhat.com wrote:
>> > On Fri, 2021‑07‑23 at 21:46 ‑0400, Digimer wrote:
>> > > After a LOT of hassle, I finally got it updated, but OMG it was
>> > > painful.
>> > > 
>> > > I degraded the cluster (unsure if needed), set maintenance mode,
>> > > deleted
>> > > the stonith levels, deleted the stonith devices, recreated them
>> > > with
>> > > the
>> > > updated values, recreated the stonith levels, and finally
>> > > disabled
>> > > maintenance mode.
>> > > 
>> > > It should not have been this hard, right? Why is heck would it be
>> > > that
>> > > pacemaker kept "rolling back" to old configs? I'd delete the
>> > > stonith
>> > 
>> > That is bizarre. It sounds like the CIB changes were taking effect
>> > locally, then being rejected by the rest of the cluster, which
>> > would
>> > send the "correct" CIB back to the originator.
>> > 
>> > The logs of interest would be pacemaker.log from both nodes at the
>> > time
>> > you made the first configuration change that failed. I'm guessing
>> > the
>> > logs you posted were from after that point?
>> 
>> Below are the logs. The change appears to first try at 'Jul 23
>> 16:22:27', made on an‑a02n01, included logs for a few minutes before
>> in case relevant. 
>> * an‑a02n01: 
>> https://www.alteeve.com/an‑repo/files/an‑a02n01.pacemaker.log 
>> * an‑a02n02: 
>> https://www.alteeve.com/an‑repo/files/an‑a02n02.pacemaker.log 
>> Note that the PDUs as originally configured (10.201.2.1/2) were not
>> available, so I had to disable and cleanup the stonith resources.
>> They seemed to keep getting re‑enabled, so I got to the habit of
>> doing this cycle of disable ‑> cleanup ‑> disable ‑> cleanup before I
>> could reliably get the resources to be 'stopped (disabled)' in 'pcs
>> stonith status'.
>> digimer
> 
> The initial change happened here:
> 
> Jul 23 16:22:27 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: Diff: ‑‑‑ 0.337.112 2
> Jul 23 16:22:27 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: Diff: +++ 0.338.0
6a24af66df3d9f825cc2681222f8f5d6
> Jul 23 16:22:27 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: +  /cib:  @epoch=338, @num_updates=0
> Jul 23 16:22:27 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: +  
>
/cib/configuration/resources/primitive[@id='apc_snmp_node1_an‑pdu03']/instance
>
_attributes[@id='apc_snmp_node1_an‑pdu03‑instance_attributes']/nvpair[@id='apc_
> snmp_node1_an‑pdu03‑instance_attributes‑ip']:  @value=10.201.2.3
> Jul 23 16:22:27 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_replace_notify) 	info: Replaced: 0.337.112 ‑> 0.338.0 from
an‑a02n02
> Jul 23 16:22:27 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_process_request) 	info: Completed cib_replace operation for
section 
> configuration: OK (rc=0, origin=an‑a02n02/cibadmin/2, version=0.338.0)
> 
> origin=an‑a02n02/cibadmin/2 means that someone or something ran the
> cibadmin tool on an‑02n02. Presumably this was your interactive pcs
> command.
> 
> It was then reverted by:

I wonder about the gap between 338 and 343...


> 
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: Diff: ‑‑‑ 0.343.3 2
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: Diff: +++ 0.344.0 (null)
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: +  /cib:  @epoch=344, @num_updates=0
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: ++ /cib/configuration/resources:  <primitive 
> class="stonith" id="apc_snmp_node1_an‑pdu03" type="fence_apc_snmp"/>
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: ++                                  
> <instance_attributes id="apc_snmp_node1_an‑pdu03‑instance_attributes">
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: ++                                    <nvpair 
> id="apc_snmp_node1_an‑pdu03‑instance_attributes‑ip" name="ip" 
> value="10.201.2.1"/>
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: ++                                    <nvpair 
> id="apc_snmp_node1_an‑pdu03‑instance_attributes‑pcmk_host_list" 
> name="pcmk_host_list" value="an‑a02n01"/>
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: ++                                    <nvpair 
> id="apc_snmp_node1_an‑pdu03‑instance_attributes‑pcmk_off_action" 
> name="pcmk_off_action" value="reboot"/>
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: ++                                    <nvpair 
> id="apc_snmp_node1_an‑pdu03‑instance_attributes‑port" name="port"
value="5"/>
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: ++                                  
></instance_attributes>
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: ++                                  <operations>
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: ++                                    <op 
> id="apc_snmp_node1_an‑pdu03‑monitor‑interval‑60" interval="60"
name="monitor"/>
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: ++                                 
</operations>
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_perform_op) 	info: ++                                </primitive>
> Jul 23 16:22:50 an‑a02n01.alteeve.com pacemaker‑based     [121628] 
> (cib_process_request) 	info: Completed cib_apply_diff operation for
section 
> 'all': OK (rc=0, origin=an‑a02n02/cibadmin/2, version=0.344.0)
> 
> Notice the origin is still cibadmin on an‑a02n02. So this was either
> you, or a script or cron on that node. I don't see any additional
> details on that node.
> ‑‑ 
> Ken Gaillot <kgaillot at redhat.com>
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list