[ClusterLabs] Antw: Re: Antw: [EXT] Node fenced for unknown reason

Andrei Borzenkov arvidjaar at gmail.com
Fri Apr 16 00:51:45 EDT 2021


On 15.04.2021 16:39, Klaus Wenninger wrote:
> On 4/15/21 3:26 PM, Ulrich Windl wrote:
>>>>> Steffen Vinther Sørensen <svinther at gmail.com> schrieb am 15.04.2021 um
>> 14:56 in
>> Nachricht
>> <CALhdMBiXZoYF-Gxg82oNT4MGFm6Q-_imCeUVHyPgWKy41JjFSg at mail.gmail.com>:
>>> On Thu, Apr 15, 2021 at 2:29 PM Ulrich Windl
>>> <Ulrich.Windl at rz.uni-regensburg.de> wrote:
>>>>>>> Steffen Vinther Sørensen <svinther at gmail.com> schrieb am
>>>>>>> 15.04.2021 um
>>>> 13:10 in
>>>> Nachricht
>>>> <CALhdMBhMQRwmgoWEWuiGMDr7HfVOTTKvW8=NQMs2P2e9p8y9Jw at mail.gmail.com>:
>>>>> Hi there,
>>>>>
>>>>> In this 3 node cluster, node03 been offline for a while, and being
>>>>> brought up to service. Then a migration of a VirtualDomain is being
>>>>> attempted, and node02 is then fenced.
>>>>>
>>>>> Provided is logs from all 2 nodes, and the 'pcs config' as well as a
>>>>> bzcatted pe-warn. Anyone with an idea of why the node was fenced ? Is
>>>>> it because of the failed ipmi monitor warning ?
>>>> After a short glace it looks as if the network traffic used for VM
>> migration
>>>> killed the corosync (or other) communication.
>>>>
>>> May I ask what part is making you think so ?
>> The part that I saw no reason for an intended fencing.
> And it looks like node02 is being cut off from all
> networking-communication - both corosync & ipmi.

Well, IPMI fencing was (claimed to be) successful, so monitoring errors
could be false positive. Still it is something that needs investigation.

... judging by

Apr 15 06:59:26 kvm03-node02 systemd-logind[4179]: Power key pressed.

IPMI fencing *was* successful.

> May really be the networking-load although I would
> rather bet on something more systematic like a
> Mac/IP-conflict with the VM or something.
> I see you are having libvirtd under cluster-control.
> Maybe bringing up the network-topology destroys the
> connection between the nodes.
> Has the cluster been working with the 3 nodes before?
> 
> 
> Klaus
>>
>>>>>
>>>>> Here is the outline:
>>>>>
>>>>> At 06:58:27 node03 is being activated with 'pcs start node03', nothing
>>>>> suspicious in the logs
>>>>>
>>>>> At  06:59:17 a resource migration is attempted from node02 to node03
>>>>> with 'pcs resource move sikkermail30 kvm03-node02.logiva-gcs.dk'
>>>>>
>>>>>
>>>>> on node01 this happens:
>>>>>
>>>>> Apr 15 06:59:17 kvm03-node01 pengine[29024]:  warning: Processing
>>>>> failed monitor of ipmi-fencing-node01 on kvm03-node02.logiva-gcs.dk:
>>>>> unknown error
>>>>>
>>>>> And node02 is fenced ?
>>>>>
>>>>> /Steffen
>>>>
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/



More information about the Users mailing list