[ClusterLabs] Antw: Re: why is node fenced ?

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Tue Aug 13 09:14:51 EDT 2019


You said you booted the hosts sequentially. From the logs they were starting in
parallel.

>>> "Lentes, Bernd" <bernd.lentes at helmholtz-muenchen.de> schrieb am 13.08.2019
um
13:53 in Nachricht
<767205671.1953556.1565697218136.JavaMail.zimbra at helmholtz-muenchen.de>:
> ‑‑‑‑‑ On Aug 12, 2019, at 7:47 PM, Chris Walker cwalker at cray.com wrote:
> 
>> When ha‑idg‑1 started Pacemaker around 17:43, it did not see ha‑idg‑2, for
>> example,
>> 
>> Aug 09 17:43:05 [6318] ha‑idg‑1 pacemakerd:     info: 
> pcmk_quorum_notification:
>> Quorum retained | membership=1320 members=1
>> 
>> after ~20s (dc‑deadtime parameter), ha‑idg‑2 is marked 'unclean' and
STONITHed
>> as part of startup fencing.
>> 
>> There is nothing in ha‑idg‑2's HA logs around 17:43 indicating that it saw
>> ha‑idg‑1 either, so it appears that there was no communication at all
between
>> the two nodes.
>> 
>> I'm not sure exactly why the nodes did not see one another, but there are
>> indications of network issues around this time
>> 
>> 2019‑08‑09T17:42:16.427947+02:00 ha‑idg‑2 kernel: [ 1229.245533] bond1:
now
>> running without any active interface!
>> 
>> so perhaps that's related.
> 
> This is the initialization of the bond1 on ha‑idg‑1 during boot.
> 3 seconds later bond1 is fine:
> 
> 2019‑08‑09T17:42:19.299886+02:00 ha‑idg‑2 kernel: [ 1232.117470] tg3 
> 0000:03:04.0 eth2: Link is up at 1000 Mbps, full duplex
> 2019‑08‑09T17:42:19.299908+02:00 ha‑idg‑2 kernel: [ 1232.117482] tg3 
> 0000:03:04.0 eth2: Flow control is on for TX and on for RX
> 2019‑08‑09T17:42:19.315756+02:00 ha‑idg‑2 kernel: [ 1232.131565] tg3 
> 0000:03:04.1 eth3: Link is up at 1000 Mbps, full duplex
> 2019‑08‑09T17:42:19.315767+02:00 ha‑idg‑2 kernel: [ 1232.131568] tg3 
> 0000:03:04.1 eth3: Flow control is on for TX and on for RX
> 2019‑08‑09T17:42:19.351781+02:00 ha‑idg‑2 kernel: [ 1232.169386] bond1: link

> status definitely up for interface eth2, 1000 Mbps full duplex
> 2019‑08‑09T17:42:19.351792+02:00 ha‑idg‑2 kernel: [ 1232.169390] bond1:
making 
> interface eth2 the new active one
> 2019‑08‑09T17:42:19.352521+02:00 ha‑idg‑2 kernel: [ 1232.169473] bond1:
first 
> active interface up!
> 2019‑08‑09T17:42:19.352532+02:00 ha‑idg‑2 kernel: [ 1232.169480] bond1: link

> status definitely up for interface eth3, 1000 Mbps full duplex
> 
> also on ha‑idg‑1:
> 
> 2019‑08‑09T17:42:19.168035+02:00 ha‑idg‑1 kernel: [  110.164250] tg3 
> 0000:02:00.3 eth3: Link is up at 1000 Mbps, full duplex
> 2019‑08‑09T17:42:19.168050+02:00 ha‑idg‑1 kernel: [  110.164252] tg3 
> 0000:02:00.3 eth3: Flow control is on for TX and on for RX
> 2019‑08‑09T17:42:19.168052+02:00 ha‑idg‑1 kernel: [  110.164254] tg3 
> 0000:02:00.3 eth3: EEE is disabled
> 2019‑08‑09T17:42:19.172020+02:00 ha‑idg‑1 kernel: [  110.171378] tg3 
> 0000:02:00.2 eth2: Link is up at 1000 Mbps, full duplex
> 2019‑08‑09T17:42:19.172028+02:00 ha‑idg‑1 kernel: [  110.171380] tg3 
> 0000:02:00.2 eth2: Flow control is on for TX and on for RX
> 2019‑08‑09T17:42:19.172029+02:00 ha‑idg‑1 kernel: [  110.171382] tg3 
> 0000:02:00.2 eth2: EEE is disabled
>  ...
> 2019‑08‑09T17:42:19.244066+02:00 ha‑idg‑1 kernel: [  110.240310] bond1: link

> status definitely up for interface eth2, 1000 Mbps full duplex
> 2019‑08‑09T17:42:19.244083+02:00 ha‑idg‑1 kernel: [  110.240311] bond1:
making 
> interface eth2 the new active one
> 2019‑08‑09T17:42:19.244085+02:00 ha‑idg‑1 kernel: [  110.240353] bond1:
first 
> active interface up!
> 2019‑08‑09T17:42:19.244087+02:00 ha‑idg‑1 kernel: [  110.240356] bond1: link

> status definitely up for interface eth3, 1000 Mbps full duplex
> 
> And the cluster is started afterwards on ha‑idg‑1 at 17:43:04. I don't find

> further entries for problems with bond1. So i think it's not related.
> Time is synchronized by ntp.
> 
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz‑muenchen.de 
> Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
> Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich 
> Bassler, Kerstin Guenther
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt‑IdNr: DE 129521671
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 





More information about the Users mailing list