[ClusterLabs] Antw: Re: Antw: [EXT] Node fenced for unknown reason

Klaus Wenninger kwenning at redhat.com
Thu Apr 15 09:39:03 EDT 2021


On 4/15/21 3:26 PM, Ulrich Windl wrote:
>>>> Steffen Vinther Sørensen <svinther at gmail.com> schrieb am 15.04.2021 um
> 14:56 in
> Nachricht
> <CALhdMBiXZoYF-Gxg82oNT4MGFm6Q-_imCeUVHyPgWKy41JjFSg at mail.gmail.com>:
>> On Thu, Apr 15, 2021 at 2:29 PM Ulrich Windl
>> <Ulrich.Windl at rz.uni-regensburg.de> wrote:
>>>>>> Steffen Vinther Sørensen <svinther at gmail.com> schrieb am 15.04.2021 um
>>> 13:10 in
>>> Nachricht
>>> <CALhdMBhMQRwmgoWEWuiGMDr7HfVOTTKvW8=NQMs2P2e9p8y9Jw at mail.gmail.com>:
>>>> Hi there,
>>>>
>>>> In this 3 node cluster, node03 been offline for a while, and being
>>>> brought up to service. Then a migration of a VirtualDomain is being
>>>> attempted, and node02 is then fenced.
>>>>
>>>> Provided is logs from all 2 nodes, and the 'pcs config' as well as a
>>>> bzcatted pe-warn. Anyone with an idea of why the node was fenced ? Is
>>>> it because of the failed ipmi monitor warning ?
>>> After a short glace it looks as if the network traffic used for VM
> migration
>>> killed the corosync (or other) communication.
>>>
>> May I ask what part is making you think so ?
> The part that I saw no reason for an intended fencing.
And it looks like node02 is being cut off from all
networking-communication - both corosync & ipmi.
May really be the networking-load although I would
rather bet on something more systematic like a
Mac/IP-conflict with the VM or something.
I see you are having libvirtd under cluster-control.
Maybe bringing up the network-topology destroys the
connection between the nodes.
Has the cluster been working with the 3 nodes before?


Klaus
>
>>>>
>>>> Here is the outline:
>>>>
>>>> At 06:58:27 node03 is being activated with 'pcs start node03', nothing
>>>> suspicious in the logs
>>>>
>>>> At  06:59:17 a resource migration is attempted from node02 to node03
>>>> with 'pcs resource move sikkermail30 kvm03-node02.logiva-gcs.dk'
>>>>
>>>>
>>>> on node01 this happens:
>>>>
>>>> Apr 15 06:59:17 kvm03-node01 pengine[29024]:  warning: Processing
>>>> failed monitor of ipmi-fencing-node01 on kvm03-node02.logiva-gcs.dk:
>>>> unknown error
>>>>
>>>> And node02 is fenced ?
>>>>
>>>> /Steffen
>>>



More information about the Users mailing list