[ClusterLabs] why is node fenced ?

Mon Aug 12 13:47:02 EDT 2019

When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for example,

Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd:     info: pcmk_quorum_notification: Quorum retained | membership=1320 members=1

after ~20s (dc-deadtime parameter), ha-idg-2 is marked 'unclean' and STONITHed as part of startup fencing.

There is nothing in ha-idg-2's HA logs around 17:43 indicating that it saw ha-idg-1 either, so it appears that there was no communication at all between the two nodes.

I'm not sure exactly why the nodes did not see one another, but there are indications of network issues around this time

2019-08-09T17:42:16.427947+02:00 ha-idg-2 kernel: [ 1229.245533] bond1: now running without any active interface!

so perhaps that's related.

HTH,
Chris

On 8/12/19, 12:09 PM, "Users on behalf of Lentes, Bernd" <users-bounces at clusterlabs.org on behalf of bernd.lentes at helmholtz-muenchen.de> wrote:

    Hi,

    last Friday (9th of August) i had to install patches on my two-node cluster.
    I put one of the nodes (ha-idg-2) into standby (crm node standby ha-idg-2), patched it, rebooted, 
    started the cluster (systemctl start pacemaker) again, put the node again online, everything fine.

    Then i wanted to do the same procedure with the other node (ha-idg-1).
    I put it in standby, patched it, rebooted, started pacemaker again.
    But then ha-idg-1 fenced ha-idg-2, it said the node is unclean.
    I know that nodes which are unclean need to be shutdown, that's logical.

    But i don't know from where the conclusion comes that the node is unclean respectively why it is unclean,
    i searched in the logs and didn't find any hint.

    I put the syslog and the pacemaker log on a seafile share, i'd be very thankful if you'll have a look.
    https://hmgubox.helmholtz-muenchen.de/d/53a10960932445fb9cfe/

    Here the cli history of the commands:

    17:03:04  crm node standby ha-idg-2
    17:07:15  zypper up (install Updates on ha-idg-2)
    17:17:30  systemctl reboot
    17:25:21  systemctl start pacemaker.service
    17:25:47  crm node online ha-idg-2
    17:26:35  crm node standby ha-idg1-
    17:30:21  zypper up (install Updates on ha-idg-1)
    17:37:32  systemctl reboot
    17:43:04  systemctl start pacemaker.service
    17:44:00  ha-idg-1 is fenced

    Thanks.

    Bernd

    OS is SLES 12 SP4, pacemaker 1.1.19, corosync 2.3.6-9.13.1

    -- 

    Bernd Lentes 
    Systemadministration 
    Institut für Entwicklungsgenetik 
    Gebäude 35.34 - Raum 208 
    HelmholtzZentrum münchen 
    bernd.lentes at helmholtz-muenchen.de 
    phone: +49 89 3187 1241 
    phone: +49 89 3187 3827 
    fax: +49 89 3187 2294 
    http://www.helmholtz-muenchen.de/idg 

    Perfekt ist wer keine Fehler macht 
    Also sind Tote perfekt

    Helmholtz Zentrum Muenchen
    Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
    Ingolstaedter Landstr. 1
    85764 Neuherberg
    www.helmholtz-muenchen.de
    Aufsichtsratsvorsitzende: MinDir'in Prof. Dr. Veronika von Messling
    Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler, Kerstin Guenther
    Registergericht: Amtsgericht Muenchen HRB 6466
    USt-IdNr: DE 129521671

    _______________________________________________
    Manage your subscription:
    https://lists.clusterlabs.org/mailman/listinfo/users

    ClusterLabs home: https://www.clusterlabs.org/