[ClusterLabs] fence_mpath in latest fence-agents: single reservation after fence
Rafael David Tinoco
rafaeldtinoco at ubuntu.com
Mon Jun 1 09:04:32 EDT 2020
Hello again,
Long time I don't show up... I was finishing up details of Ubuntu 20.04
HA packages (with lots of other stuff), so sorry for not being active
until now (about to change). During my regression lab preparation, as I
spoke in latest HA conf, I'm facing a situation I'd like to have some
inputs on if anyone has...
I'm clearing up needed fence_mpath/fence_iscsi setup for all Ubuntu
versions:
https://bugs.launchpad.net/ubuntu/+source/fence-agents/+bug/1864404
and I just faced this:
- 3 x node cluster setup
- 3 x nodes share 4 paths to /dev/mapper/volume{00..10}
- Using /dev/mapper/volume01 for fencing tests
- softdog configured for /dev/watchdog
- fence_mpath_check installed in /etc/watchdog.d/
----
(k)rafaeldtinoco at clusterg01:~$ crm configure show
node 1: clusterg01
node 2: clusterg02
node 3: clusterg03
primitive fence-mpath-clusterg01 stonith:fence_mpath \
params pcmk_on_timeout=70 pcmk_off_timeout=70
pcmk_host_list=clusterg01 pcmk_monitor_action=metadata
pcmk_reboot_action=off key=59450000 devices="/dev/mapper/volume01"
power_wait=65 \
meta provides=unfencing target-role=Started
primitive fence-mpath-clusterg02 stonith:fence_mpath \
params pcmk_on_timeout=70 pcmk_off_timeout=70
pcmk_host_list=clusterg02 pcmk_monitor_action=metadata
pcmk_reboot_action=off key=59450001 devices="/dev/mapper/volume01"
power_wait=65 \
meta provides=unfencing target-role=Started
primitive fence-mpath-clusterg03 stonith:fence_mpath \
params pcmk_on_timeout=70 pcmk_off_timeout=70
pcmk_host_list=clusterg03 pcmk_monitor_action=metadata
pcmk_reboot_action=off key=59450002 devices="/dev/mapper/volume01"
power_wait=65 \
meta provides=unfencing target-role=Started
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=2.0.3-4b1f869f0f \
cluster-infrastructure=corosync \
cluster-name=clusterg \
stonith-enabled=true \
no-quorum-policy=stop \
last-lrm-refresh=1590773755
----
(k)rafaeldtinoco at clusterg03:~$ crm status
Cluster Summary:
* Stack: corosync
* Current DC: clusterg02 (version 2.0.3-4b1f869f0f) - partition with
quorum
* Last updated: Mon Jun 1 12:55:13 2020
* Last change: Mon Jun 1 04:35:07 2020 by root via cibadmin on
clusterg03
* 3 nodes configured
* 3 resource instances configured
Node List:
* Online: [ clusterg01 clusterg02 clusterg03 ]
Full List of Resources:
* fence-mpath-clusterg01 (stonith:fence_mpath): Started clusterg02
* fence-mpath-clusterg02 (stonith:fence_mpath): Started clusterg03
* fence-mpath-clusterg03 (stonith:fence_mpath): Started clusterg01
----
(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -r
/dev/mapper/volume01
PR generation=0x2d, Reservation follows:
Key = 0x59450001
scope = LU_SCOPE, type = Write Exclusive, registrants only
(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -k
/dev/mapper/volume01
PR generation=0x2d, 12 registered reservation keys follow:
0x59450001
0x59450001
0x59450001
0x59450001
0x59450002
0x59450002
0x59450002
0x59450002
0x59450000
0x59450000
0x59450000
0x59450000
----
You can see that everything looks fine. If I disable the 2 interconnects
I have for corosync:
(k)rafaeldtinoco at clusterg01:~$ sudo corosync-quorumtool -a
Quorum information
------------------
Date: Mon Jun 1 12:56:00 2020
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 1
Ring ID: 1.120
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
1 1 clusterg01, clusterg01bkp (local)
2 1 clusterg02, clusterg02bkp
3 1 clusterg03, clusterg03bkp
for node clusterg01 I have it fenced correctly:
Pending Fencing Actions:
* reboot of clusterg01 pending: client=pacemaker-controld.906,
origin=clusterg02
(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -r
/dev/mapper/volume01
PR generation=0x2e, Reservation follows:
Key = 0x59450001
scope = LU_SCOPE, type = Write Exclusive, registrants only
(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -k
/dev/mapper/volume01
PR generation=0x2e, 8 registered reservation keys follow:
0x59450001
0x59450001
0x59450001
0x59450001
0x59450002
0x59450002
0x59450002
0x59450002
and watchdog reboots it.. but.. turns out that it returns with just 1
reservation key for 1 path (instead of 4). I was wondering if that was
because of the async nature of the combination: systemd + open-iscsi +
multipath-tools + pacemaker service startup.
Check:
(k)rafaeldtinoco at clusterg01:~$ uptime
12:58:22 up 0 min, 0 users, load average: 0.31, 0.09, 0.03
(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -r
/dev/mapper/volume01
PR generation=0x2f, Reservation follows:
Key = 0x59450001
scope = LU_SCOPE, type = Write Exclusive, registrants only
(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -k
/dev/mapper/volume01
PR generation=0x2f, 9 registered reservation keys follow:
0x59450001
0x59450001
0x59450001
0x59450001
0x59450002
0x59450002
0x59450002
0x59450002
0x59450000
After this ^ I have to run:
(k)rafaeldtinoco at clusterg01:~$ sudo mpathpersist --out --register
--param-rk=0x59450000 /dev/mapper/volume01
persistent reserve out: scsi status: Reservation Conflict
PR out: command failed
(k)rafaeldtinoco at clusterg01:~$ sudo fence_mpath -v -d
/dev/mapper/volume01 -n 59450000 -o on
2020-06-01 12:59:46,388 INFO: Executing: /usr/sbin/mpathpersist -i -k -d
/dev/mapper/volume01
To guarantee all reservations are correctly placed again, after the
fence was done:
(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -k
/dev/mapper/volume01
PR generation=0x33, 12 registered reservation keys follow:
0x59450001
0x59450001
0x59450001
0x59450001
0x59450002
0x59450002
0x59450002
0x59450002
0x59450000
0x59450000
0x59450000
0x59450000
I was wondering if "resource-agents-deps.target" being RequiredBy in
[Install] systemd section for open-iscsi.service and
multipath-tools.service, together with
"Before=resource-agents-deps.target" in [Unit] section, would be enough
but in this case I think it is not enough.
Any idea why this happens ? Did the agent start with there was a single
path available to the disk when iscsi session was being established and
multipath-tools had scanned a single path only ? I tend to think that,
if this was the case, sometimes I would have 1 path, sometimes 2, etc..
and not a single path with reservations all the time (missing 3
reservations).
OR there is something else about the PERSIST RESERVATION I'm missing
from SBC-3/4.
Any thoughts ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/users/attachments/20200601/d662c832/attachment.htm>
More information about the Users
mailing list