[ClusterLabs] Antw: why is node fenced ?

Thu May 23 09:01:12 EDT 2019

----- On May 20, 2019, at 8:28 AM, Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de wrote:

>>>> "Lentes, Bernd" <bernd.lentes at helmholtz-muenchen.de> schrieb am 16.05.2019
> um
> 17:10 in Nachricht
> <1151882511.6631123.1558019430655.JavaMail.zimbra at helmholtz-muenchen.de>:
>> Hi,
>> 
>> my HA-Cluster with two nodes fenced one on 14th of may.
>> ha-idg-1 has been the DC, ha-idg-2 was fenced.
>> It happened around 11:30 am.
>> The log from the fenced one isn't really informative:
>> 
>> ==================================
>> 2019-05-14T11:22:09.948980+02:00 ha-idg-2 liblogging-stdlog: -- MARK --
>> 2019-05-14T11:28:21.548898+02:00 ha-idg-2 sshd[14269]: Accepted
>> keyboard-interactive/pam for root from 10.35.34.70 port 59449 ssh2
>> 2019-05-14T11:28:21.550602+02:00 ha-idg-2 sshd[14269]:
>> pam_unix(sshd:session): session opened for user root by (uid=0)
>> 2019-05-14T11:28:21.554640+02:00 ha-idg-2 systemd-logind[2798]: New session
> 
>> 15385 of user root.
>> 2019-05-14T11:28:21.555067+02:00 ha-idg-2 systemd[1]: Started Session 15385
> 
>> of user root.
>> 
>> 2019-05-14T11:44:07.664785+02:00 ha-idg-2 systemd[1]: systemd 228 running in
> 
>> system mode. (+PAM -AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINIT +UTMP
>> +LIBCRYPTSETUP +GC   Neustart !!!
>> RYPT -GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN)
>> 2019-05-14T11:44:07.664902+02:00 ha-idg-2 kernel: [    0.000000] Linux
>> version 4.12.14-95.13-default (geeko at buildhost) (gcc version 4.8.5 (SUSE
>> Linux) ) #1 SMP Fri Mar
>> 22 06:04:58 UTC 2019 (c01bf34)
>> 2019-05-14T11:44:07.665492+02:00 ha-idg-2 systemd[1]: Detected architecture
> 
>> x86-64.
>> 2019-05-14T11:44:07.665510+02:00 ha-idg-2 kernel: [    0.000000] Command
>> line: BOOT_IMAGE=/boot/vmlinuz-4.12.14-95.13-default
>> root=/dev/mapper/vg_local-lv_root resume=/
>> dev/disk/by-uuid/2849c504-2e45-4ec8-bbf8-724cf358ee25 splash=verbose
>> showopts
>> 2019-05-14T11:44:07.665510+02:00 ha-idg-2 systemd[1]: Set hostname to
>><ha-idg-2>.
>> =================================

>> 
>> One network interface is gone for a short period. But it's in a bonding
>> device (round-robin),
>> so the connection shouldn't be lost. Both nodes are connected directly,
>> there is no switch in between.
> 
> I think you misunderstood: a round-robin bonding device is not fault-safe
> IMHO, but it depends a lot on your cabling details. Also you did not show the
> logs on the other nodes.

/usr/src/linux/Documentation/networking/bonding.txt says:
" balance-rr or 0: Round-robin policy: Transmit packets in sequential
                order from the first available slave through the
                last.  This mode provides load balancing >> and fault
                tolerance. <<
Nevertheless i think i will switch to active/backup.

I showed up /var/log/messages from the fenced note above.
Or do you mean /var/log/pacemaker.log from the fenced one ?
Isn't that the same as the one from the DC ?

>> I manually (ifconfig eth3 down) stopped the interface afterwards several
>> times ... nothing happened.
>> The same with the second Interface (eth2).

That means i also stopped several time the second interface, also with no fencing of the node.

Bernd

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Stellv. Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671