[ClusterLabs] HALVM monitor action fail on slave node. Possible bug?
Marco Marino
marino.mrc at gmail.com
Fri Apr 13 09:29:37 EDT 2018
Hello, I'm trying to configure a simple 2 node cluster with drbd and HALVM
(ocf:heartbeat:LVM) but I have a problem that I'm not able to solve, to I
decided to write this long post. I need to really understand what I'm doing
and where I'm doing wrong.
More precisely, I'm configuring a pacemaker cluster with 2 nodes and only
one drbd resource. Here all operations:
- System configuration
hostnamectl set-hostname pcmk[12]
yum update -y
yum install vim wget git -y
vim /etc/sysconfig/selinux -> permissive mode
systemctl disable firewalld
reboot
- Network configuration
[pcmk1]
nmcli connection modify corosync ipv4.method manual ipv4.addresses
192.168.198.201/24 ipv6.method ignore connection.autoconnect yes
nmcli connection modify replication ipv4.method manual ipv4.addresses
192.168.199.201/24 ipv6.method ignore connection.autoconnect yes
[pcmk2]
nmcli connection modify corosync ipv4.method manual ipv4.addresses
192.168.198.202/24 ipv6.method ignore connection.autoconnect yes
nmcli connection modify replication ipv4.method manual ipv4.addresses
192.168.199.202/24 ipv6.method ignore connection.autoconnect yes
ssh-keyget -t rsa
ssh-copy-id root at pcmk[12]
scp /etc/hosts root at pcmk2:/etc/hosts
- Drbd Repo configuration and drbd installation
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh
http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
yum update -y
yum install drbd84-utils kmod-drbd84 -y
- Drbd Configuration:
Creating a new partition on top of /dev/vdb -> /dev/vdb1 of type
"Linux" (83)
[/etc/drbd.d/global_common.conf]
usage-count no;
[/etc/drbd.d/myres.res]
resource myres {
on pcmk1 {
device /dev/drbd0;
disk /dev/vdb1;
address 192.168.199.201:7789;
meta-disk internal;
}
on pcmk2 {
device /dev/drbd0;
disk /dev/vdb1;
address 192.168.199.202:7789;
meta-disk internal;
}
}
scp /etc/drbd.d/myres.res root at pcmk2:/etc/drbd.d/myres.res
systemctl start drbd <-- only for test. The service is disabled at boot!
drbdadm create-md myres
drbdadm up myres
drbdadm primary --force myres
- LVM Configuration
[root at pcmk1 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
vda 252:0 0 20G 0 disk
├─vda1 252:1 0 1G 0 part /boot
└─vda2 252:2 0 19G 0 part
├─cl-root 253:0 0 17G 0 lvm /
└─cl-swap 253:1 0 2G 0 lvm [SWAP]
vdb 252:16 0 8G 0 disk
└─vdb1 252:17 0 8G 0 part <--- /dev/vdb1 is the partition
I'd like to use as backing device for drbd
└─drbd0 147:0 0 8G 0 disk
[/etc/lvm/lvm.conf]
write_cache_state = 0
use_lvmetad = 0
filter = [ "a|drbd.*|", "a|vda.*|", "r|.*|" ]
Disabling lvmetad service
systemctl disable lvm2-lvmetad.service
systemctl disable lvm2-lvmetad.socket
reboot
- Creating volume group and logical volume
systemctl start drbd (both nodes)
drbdadm primary myres
pvcreate /dev/drbd0
vgcreate havolumegroup /dev/drbd0
lvcreate -n c-vol1 -L1G havolumegroup
[root at pcmk1 ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta%
Move Log Cpy%Sync Convert
root cl -wi-ao----
<17.00g
swap cl -wi-ao----
2.00g
c-vol1 havolumegroup -wi-a----- 1.00g
- Cluster Configuration
yum install pcs fence-agents-all -y
systemctl enable pcsd
systemctl start pcsd
echo redhat | passwd --stdin hacluster
pcs cluster auth pcmk1 pcmk2
pcs cluster setup --name ha_cluster pcmk1 pcmk2
pcs cluster start --all
pcs cluster enable --all
pcs property set stonith-enabled=false <--- Just for test!!!
pcs property set no-quorum-policy=ignore
- Drbd resource configuration
pcs cluster cib drbd_cfg
pcs -f drbd_cfg resource create DrbdRes ocf:linbit:drbd
drbd_resource=myres op monitor interval=60s
pcs -f drbd_cfg resource master DrbdResClone DrbdRes master-max=1
master-node-max=1 clone-max=2 clone-node-max=1 notify=true
[root at pcmk1 ~]# pcs -f drbd_cfg resource show
Master/Slave Set: DrbdResClone [DrbdRes]
Stopped: [ pcmk1 pcmk2 ]
[root at pcmk1 ~]#
Testing the failover with a forced shutoff of pcmk1. When pcmk1 returns
up, drbd is slave but logical volume is not active on pcmk2. So I need HALVM
[root at pcmk2 ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta%
Move Log Cpy%Sync Convert
root cl -wi-ao----
<17.00g
swap cl -wi-ao----
2.00g
c-vol1 havolumegroup -wi-------
1.00g
[root at pcmk2 ~]#
- Lvm resource and constraints
pcs cluster cib lvm_cfg
pcs -f lvm_cfg resource create HALVM ocf:heartbeat:LVM
volgrpname=havolumegroup
pcs -f lvm_cfg constraint colocation add HALVM with master DrbdResClone
INFINITY
pcs -f lvm_cfg constraint order promote DrbdResClone then start HALVM
[root at pcmk1 ~]# pcs -f lvm_cfg constraint
Location Constraints:
Ordering Constraints:
promote DrbdResClone then start HALVM (kind:Mandatory)
Colocation Constraints:
HALVM with DrbdResClone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
Ticket Constraints:
[root at pcmk1 ~]#
[root at pcmk1 ~]# pcs status
Cluster name: ha_cluster
Stack: corosync
Current DC: pcmk2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with
quorum
Last updated: Fri Apr 13 15:12:49 2018
Last change: Fri Apr 13 15:05:18 2018 by root via cibadmin on pcmk1
2 nodes configured
2 resources configured
Online: [ pcmk1 pcmk2 ]
Full list of resources:
Master/Slave Set: DrbdResClone [DrbdRes]
Masters: [ pcmk2 ]
Slaves: [ pcmk1 ]
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
#########[PUSHING NEW CONFIGURATION]#########
[root at pcmk1 ~]# pcs cluster cib-push lvm_cfg
CIB updated
[root at pcmk1 ~]# pcs status
Cluster name: ha_cluster
Stack: corosync
Current DC: pcmk2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with
quorum
Last updated: Fri Apr 13 15:12:57 2018
Last change: Fri Apr 13 15:12:55 2018 by root via cibadmin on pcmk1
2 nodes configured
3 resources configured
Online: [ pcmk1 pcmk2 ]
Full list of resources:
Master/Slave Set: DrbdResClone [DrbdRes]
Masters: [ pcmk2 ]
Slaves: [ pcmk1 ]
HALVM (ocf::heartbeat:LVM): Started pcmk2
Failed Actions:
* HALVM_monitor_0 on pcmk1 'unknown error' (1): call=13,
status=complete, exitreason='LVM Volume havolumegroup is not available',
last-rc-change='Fri Apr 13 15:12:56 2018', queued=0ms, exec=52ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root at pcmk1 ~]#
##########[TRYING TO CLEANUP RESOURCE CONFIGURATION]##################
[root at pcmk1 ~]# pcs resource cleanup
Waiting for 1 replies from the CRMd. OK
[root at pcmk1 ~]# pcs status
Cluster name: ha_cluster
Stack: corosync
Current DC: pcmk2 (version 1.1.16-12.el7_4.8-94ff4df) - partition with
quorum
Last updated: Fri Apr 13 15:13:18 2018
Last change: Fri Apr 13 15:12:55 2018 by root via cibadmin on pcmk1
2 nodes configured
3 resources configured
Online: [ pcmk1 pcmk2 ]
Full list of resources:
Master/Slave Set: DrbdResClone [DrbdRes]
Masters: [ pcmk2 ]
Slaves: [ pcmk1 ]
HALVM (ocf::heartbeat:LVM): Started pcmk2
Failed Actions:
* HALVM_monitor_0 on pcmk1 'unknown error' (1): call=26,
status=complete, exitreason='LVM Volume havolumegroup is not available',
last-rc-change='Fri Apr 13 15:13:17 2018', queued=0ms, exec=113ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root at pcmk1 ~]#
#########################################################
some details about packages and versions:
[root at pcmk1 ~]# rpm -qa | grep pacem
pacemaker-cluster-libs-1.1.16-12.el7_4.8.x86_64
pacemaker-libs-1.1.16-12.el7_4.8.x86_64
pacemaker-1.1.16-12.el7_4.8.x86_64
pacemaker-cli-1.1.16-12.el7_4.8.x86_64
[root at pcmk1 ~]# rpm -qa | grep coro
corosynclib-2.4.0-9.el7_4.2.x86_64
corosync-2.4.0-9.el7_4.2.x86_64
[root at pcmk1 ~]# rpm -qa | grep drbd
drbd84-utils-9.1.0-1.el7.elrepo.x86_64
kmod-drbd84-8.4.10-1_2.el7_4.elrepo.x86_64
[root at pcmk1 ~]# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
[root at pcmk1 ~]# uname -r
3.10.0-693.21.1.el7.x86_64
[root at pcmk1 ~]#
##############################################################
So it seems to me that the problem is that the "monitor" action of the
ocf:heartbeat:LVM resource is executed on both nodes even if I configured
specific colocation and ordering constraints. I don't know where the
problem is, but please I need to understand how to solve the issue. Please,
if possible I invite someone to reproduce the configuration and possibly
the issue. It seems a bug but obviously I'm not sure. What I'm worried is
that it should be pacemaker that states where and when one resource should
start so probably there is something wrong in my constraints configuration.
I'm sorry for this long post.
Thank you,
Marco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180413/923b5ae0/attachment-0001.html>
More information about the Users
mailing list