[ClusterLabs] why is node fenced ?

Lentes, Bernd bernd.lentes at helmholtz-muenchen.de
Fri Aug 14 14:37:43 EDT 2020


----- On Aug 9, 2020, at 10:17 PM, Bernd Lentes bernd.lentes at helmholtz-muenchen.de wrote:


>> So this appears to be the problem. From these logs I would guess the
>> successful stop on ha-idg-1 did not get written to the CIB for some
>> reason. I'd look at the pe input from this transition on ha-idg-2 to
>> confirm that.
>> 
>> Without the DC knowing about the stop, it tries to schedule a new one,
>> but the node is shutting down so it can't do it, which means it has to
>> be fenced.

I checked all relevant pe-files in this time period.
This is what i found out (i just write the important entries):

ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input-3116 -G transition-3116.xml -D transition-3116.dot
Current cluster status:
 ...
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Started ha-idg-1
Transition Summary:
 ...
* Migrate    vm_nextcloud           ( ha-idg-1 -> ha-idg-2 )
Executing cluster transition:
 * Resource action: vm_nextcloud    migrate_from on ha-idg-2 <======= migrate vm_nextcloud
 * Resource action: vm_nextcloud    stop on ha-idg-1 
 * Pseudo action:   vm_nextcloud_start_0
Revised cluster status:
Node ha-idg-1 (1084777482): standby
Online: [ ha-idg-2 ]
vm_nextcloud   (ocf::heartbeat:VirtualDomain): Started ha-idg-2


ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-error-48 -G transition-4514.xml -D transition-4514.dot
Current cluster status:
Node ha-idg-1 (1084777482): standby
Online: [ ha-idg-2 ]
...
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): FAILED[ ha-idg-2 ha-idg-1 ] <====== migration failed
Transition Summary:
..
 * Recover    vm_nextcloud            (             ha-idg-2 )
Executing cluster transition:
 * Resource action: vm_nextcloud    stop on ha-idg-2
 * Resource action: vm_nextcloud    stop on ha-idg-1 
 * Resource action: vm_nextcloud    start on ha-idg-2
 * Resource action: vm_nextcloud    monitor=30000 on ha-idg-2
Revised cluster status:
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Started ha-idg-2

ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input-3117 -G transition-3117.xml -D transition-3117.dot
Current cluster status:
Node ha-idg-1 (1084777482): standby
Online: [ ha-idg-2 ]
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): FAILED ha-idg-2 <====== start on ha-idg-2 failed
Transition Summary:
 * Stop       vm_nextcloud     ( ha-idg-2 )   due to node availability <==== stop vm_nextcloud (what means due to node availability ?)
Executing cluster transition:
 * Resource action: vm_nextcloud    stop on ha-idg-2
Revised cluster status:
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped

ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input-3118 -G transition-4516.xml -D transition-4516.dot
Current cluster status:
Node ha-idg-1 (1084777482): standby
Online: [ ha-idg-2 ]
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped <============== vm_nextcloud is stopped
Transition Summary:
 * Shutdown ha-idg-1
Executing cluster transition:
 * Resource action: vm_nextcloud    stop on ha-idg-1 <==== why stop ? It is already stopped
Revised cluster status:
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped

ha-idg-1:~/why-fenced/ha-idg-2/pengine # crm_simulate -S -x pe-input-3545 -G transition-0.xml -D transition-0.dot
Current cluster status:
Node ha-idg-1 (1084777482): pending
Online: [ ha-idg-2 ]
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped <====== vm_nextcloud is stopped
Transition Summary:

Executing cluster transition:
Using the original execution date of: 2020-07-20 15:05:33Z
Revised cluster status:
vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped

ha-idg-1:~/why-fenced/ha-idg-2/pengine # crm_simulate -S -x pe-warn-749 -G transition-1.xml -D transition-1.dot
Current cluster status:
Node ha-idg-1 (1084777482): OFFLINE (standby)
Online: [ ha-idg-2 ]
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped <======= vm_nextcloud is stopped
Transition Summary:
 * Fence (Off) ha-idg-1 'resource actions are unrunnable'
Executing cluster transition:
 * Fencing ha-idg-1 (Off)
 * Pseudo action:   vm_nextcloud_stop_0 <======= why stop ? It is already stopped ?
Revised cluster status:
Node ha-idg-1 (1084777482): OFFLINE (standby)
Online: [ ha-idg-2 ]
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped

I don't understand why the cluster tries to stop a resource which is already stopped.

Bernd
Helmholtz Zentrum München

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671




More information about the Users mailing list