[ClusterLabs] corosync/dlm fencing?

Sun Jul 15 09:39:04 EDT 2018

hi!

i have a 4 node cluster running on SLES12 SP3
- pacemaker-1.1.16-4.8.x86_64
- corosync-2.3.6-9.5.1.x86_64

following configuration:

Stack: corosync
Current DC: sitea-2 (version 1.1.16-4.8-77ea74d) - partition with quorum
Last updated: Sun Jul 15 15:00:55 2018
Last change: Sat Jul 14 18:54:50 2018 by root via crm_resource on sitea-1

4 nodes configured
23 resources configured

Node sitea-1: online
        1       (ocf::pacemaker:controld):      Active 
        1       (ocf::lvm2:clvmd):      Active 
        1       (ocf::pacemaker:SysInfo):       Active 
        5       (ocf::heartbeat:VirtualDomain): Active 
        1       (ocf::heartbeat:LVM):   Active 
Node siteb-1: online
        1       (ocf::pacemaker:controld):      Active 
        1       (ocf::lvm2:clvmd):      Active 
        1       (ocf::pacemaker:SysInfo):       Active 
        1       (ocf::heartbeat:VirtualDomain): Active 
        1       (ocf::heartbeat:LVM):   Active 
Node sitea-2: online
        1       (ocf::pacemaker:controld):      Active 
        1       (ocf::lvm2:clvmd):      Active 
        1       (ocf::pacemaker:SysInfo):       Active 
        3       (ocf::heartbeat:VirtualDomain): Active 
        1       (ocf::heartbeat:LVM):   Active 
Node siteb-2: online
        1       (ocf::pacemaker:ClusterMon):    Active 
        3       (ocf::heartbeat:VirtualDomain): Active 
        1       (ocf::pacemaker:SysInfo):       Active 
        1       (stonith:external/sbd): Active 
        1       (ocf::lvm2:clvmd):      Active 
        1       (ocf::heartbeat:LVM):   Active 
        1       (ocf::pacemaker:controld):      Active 
----
and these ordering:
...
group base-group dlm clvm vg1
clone base-clone base-group \
        meta interleave=true target-role=Started ordered=true
colocation colocation-VM-base-clone-INFINITY inf: VM base-clone
order order-base-clone-VM-mandatory base-clone:start VM:start
...

for maintenance i would like to standby 1 or 2 nodes from "sitea" so that 
every Resources move off from these 2 images.
everything works fine until dlm stops as last resource on these nodes, 
then dlm_controld send fence_request - sometimes to the remaining online 
nodes, so there is online 1 node left in cluster....

messages:

....
2018-07-14T14:38:56.441157+02:00 siteb-1 dlm_controld[39725]: 678 fence 
request 3 pid 54428 startup time 1531571371 fence_all dlm_stonith
2018-07-14T14:38:56.445284+02:00 siteb-1 dlm_stonith: stonith_api_time: 
Found 0 entries for 3/(null): 0 in progress, 0 completed
2018-07-14T14:38:56.446033+02:00 siteb-1 stonith-ng[8085]:   notice: 
Client stonith-api.54428.ee6a7e02 wants to fence (reboot) '3' with device 
'(any)'
2018-07-14T14:38:56.446294+02:00 siteb-1 stonith-ng[8085]:   notice: 
Requesting peer fencing (reboot) of sitea-2
...

 # dlm_tool dump_config
daemon_debug=0
foreground=0
log_debug=0
timewarn=0
protocol=detect
debug_logfile=0
enable_fscontrol=0
enable_plock=1
plock_debug=0
plock_rate_limit=0
plock_ownership=0
drop_resources_time=10000
drop_resources_count=10
drop_resources_age=10000
post_join_delay=30
enable_fencing=1
enable_concurrent_fencing=0
enable_startup_fencing=0
repeat_failed_fencing=1
enable_quorum_fencing=1
enable_quorum_lockspace=1
help=-1
version=-1

how to find out what is happening here?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180715/d1d767f2/attachment.html>