[ClusterLabs] 2-Node cluster - both nodes unclean - can't start cluster

Lentes, Bernd bernd.lentes at helmholtz-muenchen.de
Mon Mar 13 13:18:11 EDT 2023


> -----Original Message-----
> From: Reid Wahl <nwahl at redhat.com>
> Sent: Friday, March 10, 2023 10:30 PM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> <users at clusterlabs.org>; Lentes, Bernd <bernd.lentes at helmholtz-
> muenchen.de>
> Subject: Re: [ClusterLabs] 2-Node cluster - both nodes unclean - can't 
> start
> cluster
>
> On Fri, Mar 10, 2023 at 10:49 AM Lentes, Bernd <bernd.lentes at helmholtz-
> muenchen.de> wrote:
> > (192.168.100.10:2340) was formed. Members joined: 1084777482
>
> Is this really the corosync node ID of one of your nodes? If not, what's 
> your
> corosync version? Is the number the same every time the issue happens?
> The number is so large and seemingly random that I wonder if there's some
> kind of memory corruption.
>
Yes it's correct.

ha-idg-1:~ # crm node show
ha-idg-1(1084777482): member
        maintenance=off standby=off
ha-idg-2(1084777492): member(offline)
        maintenance=off standby=off
ha-idg-1:~ #

> > Cluster node ha-idg-1 is now in unknown state      ⇐===== is that the
> > problem ?
>
> Probably a normal part of the startup process but I haven't tested it yet.
>
> > Mar 10 19:36:34 [31046] ha-idg-1 stonith-ng:   notice: handle_request:
> > Received manual confirmation that ha-idg-1 is fenced
>
> Yes
>
> > tengine_stonith_notify:  We were allegedly just fenced by a human for
> > ha-idg-1!      <=====================  what does that mean ? I didn't
> fence
> > it
>
> It means you ran `stonith_admin -C`
>
> https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-
> 1.1.24/fencing/remote.c#L945-L961
>
> > Mar 10 19:36:34 [31050] ha-idg-1       crmd:     info: crm_xml_cleanup:
> > Cleaning up memory from libxml2
> > Mar 10 19:36:34 [31044] ha-idg-1 pacemakerd:  warning:
> pcmk_child_exit:
> > Shutting cluster down because crmd[31050] had fatal failure
> > <=======================  ???
>
> Pacemaker is shutting down on the local node because it just received
> confirmation that it was fenced (because you ran `stonith_admin -C`).
> This is expected behavior.

OK. If it is expected then it's fine.

>
> Can you help me understand the issue here? You started the cluster on this
> node at 19:36:24. 10 seconds later, you ran `stonith_admin -C`, and the
> local node shut down Pacemaker, as expected. It doesn't look like
> Pacemaker stopped until you ran that command.

I didn't know that this is expected.

Bernd
(null)
Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH), Ingolstadter Landstr. 1, 85764 Neuherberg, www.helmholtz-munich.de. Geschaeftsfuehrung:  Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther, Daniela Sommer (kom.) | Aufsichtsratsvorsitzende: Prof. Dr. Veronika von Messling | Registergericht: Amtsgericht Muenchen  HRB 6466 | USt-IdNr. DE 129521671



More information about the Users mailing list