[ClusterLabs] Antw: Re: why is node fenced ?
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Tue Aug 13 09:14:51 EDT 2019
You said you booted the hosts sequentially. From the logs they were starting in
parallel.
>>> "Lentes, Bernd" <bernd.lentes at helmholtz-muenchen.de> schrieb am 13.08.2019
um
13:53 in Nachricht
<767205671.1953556.1565697218136.JavaMail.zimbra at helmholtz-muenchen.de>:
> ‑‑‑‑‑ On Aug 12, 2019, at 7:47 PM, Chris Walker cwalker at cray.com wrote:
>
>> When ha‑idg‑1 started Pacemaker around 17:43, it did not see ha‑idg‑2, for
>> example,
>>
>> Aug 09 17:43:05 [6318] ha‑idg‑1 pacemakerd: info:
> pcmk_quorum_notification:
>> Quorum retained | membership=1320 members=1
>>
>> after ~20s (dc‑deadtime parameter), ha‑idg‑2 is marked 'unclean' and
STONITHed
>> as part of startup fencing.
>>
>> There is nothing in ha‑idg‑2's HA logs around 17:43 indicating that it saw
>> ha‑idg‑1 either, so it appears that there was no communication at all
between
>> the two nodes.
>>
>> I'm not sure exactly why the nodes did not see one another, but there are
>> indications of network issues around this time
>>
>> 2019‑08‑09T17:42:16.427947+02:00 ha‑idg‑2 kernel: [ 1229.245533] bond1:
now
>> running without any active interface!
>>
>> so perhaps that's related.
>
> This is the initialization of the bond1 on ha‑idg‑1 during boot.
> 3 seconds later bond1 is fine:
>
> 2019‑08‑09T17:42:19.299886+02:00 ha‑idg‑2 kernel: [ 1232.117470] tg3
> 0000:03:04.0 eth2: Link is up at 1000 Mbps, full duplex
> 2019‑08‑09T17:42:19.299908+02:00 ha‑idg‑2 kernel: [ 1232.117482] tg3
> 0000:03:04.0 eth2: Flow control is on for TX and on for RX
> 2019‑08‑09T17:42:19.315756+02:00 ha‑idg‑2 kernel: [ 1232.131565] tg3
> 0000:03:04.1 eth3: Link is up at 1000 Mbps, full duplex
> 2019‑08‑09T17:42:19.315767+02:00 ha‑idg‑2 kernel: [ 1232.131568] tg3
> 0000:03:04.1 eth3: Flow control is on for TX and on for RX
> 2019‑08‑09T17:42:19.351781+02:00 ha‑idg‑2 kernel: [ 1232.169386] bond1: link
> status definitely up for interface eth2, 1000 Mbps full duplex
> 2019‑08‑09T17:42:19.351792+02:00 ha‑idg‑2 kernel: [ 1232.169390] bond1:
making
> interface eth2 the new active one
> 2019‑08‑09T17:42:19.352521+02:00 ha‑idg‑2 kernel: [ 1232.169473] bond1:
first
> active interface up!
> 2019‑08‑09T17:42:19.352532+02:00 ha‑idg‑2 kernel: [ 1232.169480] bond1: link
> status definitely up for interface eth3, 1000 Mbps full duplex
>
> also on ha‑idg‑1:
>
> 2019‑08‑09T17:42:19.168035+02:00 ha‑idg‑1 kernel: [ 110.164250] tg3
> 0000:02:00.3 eth3: Link is up at 1000 Mbps, full duplex
> 2019‑08‑09T17:42:19.168050+02:00 ha‑idg‑1 kernel: [ 110.164252] tg3
> 0000:02:00.3 eth3: Flow control is on for TX and on for RX
> 2019‑08‑09T17:42:19.168052+02:00 ha‑idg‑1 kernel: [ 110.164254] tg3
> 0000:02:00.3 eth3: EEE is disabled
> 2019‑08‑09T17:42:19.172020+02:00 ha‑idg‑1 kernel: [ 110.171378] tg3
> 0000:02:00.2 eth2: Link is up at 1000 Mbps, full duplex
> 2019‑08‑09T17:42:19.172028+02:00 ha‑idg‑1 kernel: [ 110.171380] tg3
> 0000:02:00.2 eth2: Flow control is on for TX and on for RX
> 2019‑08‑09T17:42:19.172029+02:00 ha‑idg‑1 kernel: [ 110.171382] tg3
> 0000:02:00.2 eth2: EEE is disabled
> ...
> 2019‑08‑09T17:42:19.244066+02:00 ha‑idg‑1 kernel: [ 110.240310] bond1: link
> status definitely up for interface eth2, 1000 Mbps full duplex
> 2019‑08‑09T17:42:19.244083+02:00 ha‑idg‑1 kernel: [ 110.240311] bond1:
making
> interface eth2 the new active one
> 2019‑08‑09T17:42:19.244085+02:00 ha‑idg‑1 kernel: [ 110.240353] bond1:
first
> active interface up!
> 2019‑08‑09T17:42:19.244087+02:00 ha‑idg‑1 kernel: [ 110.240356] bond1: link
> status definitely up for interface eth3, 1000 Mbps full duplex
>
> And the cluster is started afterwards on ha‑idg‑1 at 17:43:04. I don't find
> further entries for problems with bond1. So i think it's not related.
> Time is synchronized by ntp.
>
>
> Bernd
>
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz‑muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich
> Bassler, Kerstin Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt‑IdNr: DE 129521671
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
More information about the Users
mailing list