[ClusterLabs] pacemaker and cluster hostname reconfiguration
Riccardo Manfrin
riccardo.manfrin at athonet.com
Thu Oct 1 04:51:36 EDT 2020
Ciao,
[TXT version]
I'm among the people that have to deal with with the in-famous two nodes
problem (http://www.beekhof.net/blog/2018/two-node-problems).
I am not sure if to open a bug for this.. so I'm first off reporting on
the list.. in the hope to get fast feedback.
Problem statement
I have a cluster made by two nodes with a DRBD shared partition which
some resources (systemd services) have to stick to.
Software versions
corosync -v
Corosync Cluster Engine, version '2.4.5'
Copyright (c) 2006-2009 Red Hat, Inc.
pacemakerd --version
Pacemaker 1.1.21-4.el7
drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\
fb98589a8e76783d2c56155c645dbaf02ac7ece7\ build\ by\ mockbuild@\,\
2020-04-05\ 03:21:05
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090010
DRBD_KERNEL_VERSION=9.0.16
DRBDADM_VERSION_CODE=0x090c02
DRBDADM_VERSION=9.12.2
corosync.conf nodes:
nodelist {
node {
ring0_addr: 10.1.3.1
nodeid: 1
}
node {
ring0_addr: 10.1.3.2
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
drbd nodes config:
resource myresource {
volume 0 {
device /dev/drbd0;
disk /dev/mapper/vg0-res--etc;
meta-disk internal;
}
on 123z555666y0 {
node-id 0;
address 10.1.3.1:7789;
}
on 123z555666y1 {
node-id 1;
address 10.1.3.2:7789;
}
connection {
host 123z555666y0;
host 123z555666y1;
}
handlers {
before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh";
after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh";
}
}
I need to reconfigure the hostname of both the nodes of the cluster.
I've gathered some literature around
https://pacemaker.oss.clusterlabs.narkive.com/csHZkR5R/change-hostname
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-name.html
https://www.suse.com/support/kb/doc/?id=000018878 <- DIDN'T WORK
https://bugs.clusterlabs.org/show_bug.cgi?id=5265 <- DIDN'T WORK
but have not yet found a way to address this (unless with simultaneous
reboot of both nodes).
The procedure:
Update the hostname on both Master and Slave nodes
update /etc/hostname
update /etc/hosts
update system with hostname -F /etc/hostname
Reconfigure drbd on Master and Slave nodes
modify drbd.01.conf (attached) to reflect new hostname
invoke drbdadm adjust all
Update pacemaker config on Master node only
crm configure property maintenance-mode=true
crm configure delete --force 1
crm configure delete --force 2
crm configure xml '<node id="1" uname="newhostname0">
<instance_attributes id="node-1">
<nvpair id="node-1-standby" name="standby" value="off"/>
</instance_attributes>
</node>'
crm configure xml '<node id="2" uname="newhostname1">
<instance_attributes id="node-2">
<nvpair id="node-2-standby" name="standby" value="off"/>
</instance_attributes>
</node>'
crm resource reprobe
crm configure refresh
crm configure property maintenance-mode=false
Let's say for example that I migrate the hostnames like this
hostname10 -> hostname20
hostname11 -> hostname21
After the above procedure is concluded the cluster is correctly
reconfigured and when I check with crm_mon or crm status or crm
configure show xml or even by inspecting the cib.xml I find the proper
new hostnames fetched by pacemaker/corosync (hostname20 and hostname21).
The documentation reports that pacemaker node name is taken from
corosync.conf nodelist->ring0_addr if not an ip address: NOT MY
CASE => skip
corosync.conf nodelist->name if available: NOT MY CASE => skip
uname -n [SHOULD BE IN HERE]
Apparently case number 3 does not apply:
[root at hostname20 ~]# crm_node -n
hostname10
[root at hostname20 ~]# uname -n
hostname20
This becomes evident as soon as I reboot/poweroff one of the two nodes:
crm_mon which after the reconfiguration was correctly showing
Online: [ hostname21 hostname20 ]
"rolls back" the configuration without any notice and starts showing the
old one
Online: [ hostname10 ]
OFFLINE: [ hostname11 ]
Do you have any idea of where on heath pacemaker is recovering the old
hostnames ?
I've even checked the code and see that there are cmaps involved so I
suspect there's some caching issues involved in this.
It looks like it is retaining the old hostnames in memory and when
something .. "fails" it restores them.
Besides don't blame me for this use case (reconfigure hostnames in a
two-nodes cluster), as I didn't make it up. I just carry the pain.
R
________________________________
Riccardo Manfrin
R&D DEPARTMENT
Web<https://www.athonet.com/> | LinkedIn<https://www.linkedin.com/company/athonet/> t +39 (0)444 750045
e riccardo.manfrin at athonet.com<mailto:riccardo.manfrin at athonet.com>
[https://www.athonet.com/signature/logo_athonet.png]<https://www.athonet.com/>
ATHONET | Via Cà del Luogo, 6/8 - 36050 Bolzano Vicentino (VI) Italy
This email and any attachments are confidential and intended solely for the use of the intended recipient. If you are not the named addressee, please be aware that you shall not distribute, copy, use or disclose this email. If you have received this email by error, please notify us immediately and delete this email from your system. Email transmission cannot be guaranteed to be secured or error-free or not to contain viruses. Athonet S.r.l. processes any personal data exchanged in email correspondence in accordance with EU Reg. 679/2016 (GDPR) - you may find here the privacy policy with information on such processing and your rights. Any views or opinions presented in this email are solely those of the sender and do not necessarily represent those of Athonet S.r.l.
More information about the Users
mailing list