[ClusterLabs] why is node fenced ?
Lentes, Bernd
bernd.lentes at helmholtz-muenchen.de
Fri Aug 14 14:37:43 EDT 2020
----- On Aug 9, 2020, at 10:17 PM, Bernd Lentes bernd.lentes at helmholtz-muenchen.de wrote:
>> So this appears to be the problem. From these logs I would guess the
>> successful stop on ha-idg-1 did not get written to the CIB for some
>> reason. I'd look at the pe input from this transition on ha-idg-2 to
>> confirm that.
>>
>> Without the DC knowing about the stop, it tries to schedule a new one,
>> but the node is shutting down so it can't do it, which means it has to
>> be fenced.
I checked all relevant pe-files in this time period.
This is what i found out (i just write the important entries):
ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input-3116 -G transition-3116.xml -D transition-3116.dot
Current cluster status:
...
vm_nextcloud (ocf::heartbeat:VirtualDomain): Started ha-idg-1
Transition Summary:
...
* Migrate vm_nextcloud ( ha-idg-1 -> ha-idg-2 )
Executing cluster transition:
* Resource action: vm_nextcloud migrate_from on ha-idg-2 <======= migrate vm_nextcloud
* Resource action: vm_nextcloud stop on ha-idg-1
* Pseudo action: vm_nextcloud_start_0
Revised cluster status:
Node ha-idg-1 (1084777482): standby
Online: [ ha-idg-2 ]
vm_nextcloud (ocf::heartbeat:VirtualDomain): Started ha-idg-2
ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-error-48 -G transition-4514.xml -D transition-4514.dot
Current cluster status:
Node ha-idg-1 (1084777482): standby
Online: [ ha-idg-2 ]
...
vm_nextcloud (ocf::heartbeat:VirtualDomain): FAILED[ ha-idg-2 ha-idg-1 ] <====== migration failed
Transition Summary:
..
* Recover vm_nextcloud ( ha-idg-2 )
Executing cluster transition:
* Resource action: vm_nextcloud stop on ha-idg-2
* Resource action: vm_nextcloud stop on ha-idg-1
* Resource action: vm_nextcloud start on ha-idg-2
* Resource action: vm_nextcloud monitor=30000 on ha-idg-2
Revised cluster status:
vm_nextcloud (ocf::heartbeat:VirtualDomain): Started ha-idg-2
ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input-3117 -G transition-3117.xml -D transition-3117.dot
Current cluster status:
Node ha-idg-1 (1084777482): standby
Online: [ ha-idg-2 ]
vm_nextcloud (ocf::heartbeat:VirtualDomain): FAILED ha-idg-2 <====== start on ha-idg-2 failed
Transition Summary:
* Stop vm_nextcloud ( ha-idg-2 ) due to node availability <==== stop vm_nextcloud (what means due to node availability ?)
Executing cluster transition:
* Resource action: vm_nextcloud stop on ha-idg-2
Revised cluster status:
vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped
ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input-3118 -G transition-4516.xml -D transition-4516.dot
Current cluster status:
Node ha-idg-1 (1084777482): standby
Online: [ ha-idg-2 ]
vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped <============== vm_nextcloud is stopped
Transition Summary:
* Shutdown ha-idg-1
Executing cluster transition:
* Resource action: vm_nextcloud stop on ha-idg-1 <==== why stop ? It is already stopped
Revised cluster status:
vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped
ha-idg-1:~/why-fenced/ha-idg-2/pengine # crm_simulate -S -x pe-input-3545 -G transition-0.xml -D transition-0.dot
Current cluster status:
Node ha-idg-1 (1084777482): pending
Online: [ ha-idg-2 ]
vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped <====== vm_nextcloud is stopped
Transition Summary:
Executing cluster transition:
Using the original execution date of: 2020-07-20 15:05:33Z
Revised cluster status:
vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped
ha-idg-1:~/why-fenced/ha-idg-2/pengine # crm_simulate -S -x pe-warn-749 -G transition-1.xml -D transition-1.dot
Current cluster status:
Node ha-idg-1 (1084777482): OFFLINE (standby)
Online: [ ha-idg-2 ]
vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped <======= vm_nextcloud is stopped
Transition Summary:
* Fence (Off) ha-idg-1 'resource actions are unrunnable'
Executing cluster transition:
* Fencing ha-idg-1 (Off)
* Pseudo action: vm_nextcloud_stop_0 <======= why stop ? It is already stopped ?
Revised cluster status:
Node ha-idg-1 (1084777482): OFFLINE (standby)
Online: [ ha-idg-2 ]
vm_nextcloud (ocf::heartbeat:VirtualDomain): Stopped
I don't understand why the cluster tries to stop a resource which is already stopped.
Bernd
Helmholtz Zentrum München
Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671
More information about the Users
mailing list