<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<tt>Hello again,<br>
<br>
Long time I don't show up... I was finishing up details of Ubuntu
20.04 HA packages (with lots of other stuff), so sorry for not
being active until now (about to change). During my regression lab
preparation, as I spoke in latest HA conf, I'm facing a situation
I'd like to have some inputs on if anyone has...<br>
<br>
I'm clearing up needed fence_mpath/fence_iscsi setup for all
Ubuntu versions:<br>
<br>
<a class="moz-txt-link-freetext" href="https://bugs.launchpad.net/ubuntu/+source/fence-agents/+bug/1864404">https://bugs.launchpad.net/ubuntu/+source/fence-agents/+bug/1864404</a><br>
<br>
and I just faced this:<br>
<br>
- 3 x node cluster setup<br>
- 3 x nodes share 4 paths to /dev/mapper/volume{00..10}<br>
- Using /dev/mapper/volume01 for fencing tests<br>
- softdog configured for /dev/watchdog<br>
- fence_mpath_check installed in /etc/watchdog.d/<br>
<br>
----<br>
<br>
(k)rafaeldtinoco@clusterg01:~$ crm configure show<br>
node 1: clusterg01<br>
node 2: clusterg02<br>
node 3: clusterg03<br>
primitive fence-mpath-clusterg01 stonith:fence_mpath \<br>
params pcmk_on_timeout=70 pcmk_off_timeout=70
pcmk_host_list=clusterg01 pcmk_monitor_action=metadata
pcmk_reboot_action=off key=59450000 devices="/dev/mapper/volume01"
power_wait=65 \<br>
meta provides=unfencing target-role=Started<br>
primitive fence-mpath-clusterg02 stonith:fence_mpath \<br>
params pcmk_on_timeout=70 pcmk_off_timeout=70
pcmk_host_list=clusterg02 pcmk_monitor_action=metadata
pcmk_reboot_action=off key=59450001 devices="/dev/mapper/volume01"
power_wait=65 \<br>
meta provides=unfencing target-role=Started<br>
primitive fence-mpath-clusterg03 stonith:fence_mpath \<br>
params pcmk_on_timeout=70 pcmk_off_timeout=70
pcmk_host_list=clusterg03 pcmk_monitor_action=metadata
pcmk_reboot_action=off key=59450002 devices="/dev/mapper/volume01"
power_wait=65 \<br>
meta provides=unfencing target-role=Started<br>
property cib-bootstrap-options: \<br>
have-watchdog=false \<br>
dc-version=2.0.3-4b1f869f0f \<br>
cluster-infrastructure=corosync \<br>
cluster-name=clusterg \<br>
stonith-enabled=true \<br>
no-quorum-policy=stop \<br>
last-lrm-refresh=1590773755<br>
<br>
----<br>
<br>
(k)rafaeldtinoco@clusterg03:~$ crm status<br>
Cluster Summary:<br>
* Stack: corosync<br>
* Current DC: clusterg02 (version 2.0.3-4b1f869f0f) - partition
with quorum<br>
* Last updated: Mon Jun 1 12:55:13 2020<br>
* Last change: Mon Jun 1 04:35:07 2020 by root via cibadmin on
clusterg03<br>
* 3 nodes configured<br>
* 3 resource instances configured<br>
<br>
Node List:<br>
* Online: [ clusterg01 clusterg02 clusterg03 ]<br>
<br>
Full List of Resources:<br>
* fence-mpath-clusterg01 (stonith:fence_mpath): Started
clusterg02<br>
* fence-mpath-clusterg02 (stonith:fence_mpath): Started
clusterg03<br>
* fence-mpath-clusterg03 (stonith:fence_mpath): Started
clusterg01<br>
<br>
----<br>
<br>
(k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -r
/dev/mapper/volume01<br>
PR generation=0x2d, Reservation follows:<br>
Key = 0x59450001<br>
scope = LU_SCOPE, type = Write Exclusive, registrants only<br>
<br>
(k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -k
/dev/mapper/volume01<br>
PR generation=0x2d, 12 registered reservation keys follow:<br>
0x59450001<br>
0x59450001<br>
0x59450001<br>
0x59450001<br>
0x59450002<br>
0x59450002<br>
0x59450002<br>
0x59450002<br>
0x59450000<br>
0x59450000<br>
0x59450000<br>
0x59450000<br>
<br>
----<br>
<br>
You can see that everything looks fine. If I disable the 2
interconnects I have for corosync:<br>
<br>
(k)rafaeldtinoco@clusterg01:~$ sudo corosync-quorumtool -a<br>
Quorum information<br>
------------------<br>
Date: Mon Jun 1 12:56:00 2020<br>
Quorum provider: corosync_votequorum<br>
Nodes: 3<br>
Node ID: 1<br>
Ring ID: 1.120<br>
Quorate: Yes<br>
<br>
Votequorum information<br>
----------------------<br>
Expected votes: 3<br>
Highest expected: 3<br>
Total votes: 3<br>
Quorum: 2 <br>
Flags: Quorate <br>
<br>
Membership information<br>
----------------------<br>
Nodeid Votes Name<br>
1 1 clusterg01, clusterg01bkp (local)<br>
2 1 clusterg02, clusterg02bkp<br>
3 1 clusterg03, clusterg03bkp<br>
<br>
for node clusterg01 I have it fenced correctly:<br>
<br>
Pending Fencing Actions:<br>
* reboot of clusterg01 pending: client=pacemaker-controld.906,
origin=clusterg02<br>
<br>
(k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -r
/dev/mapper/volume01<br>
PR generation=0x2e, Reservation follows:<br>
Key = 0x59450001<br>
scope = LU_SCOPE, type = Write Exclusive, registrants only<br>
<br>
(k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -k
/dev/mapper/volume01<br>
PR generation=0x2e, 8 registered reservation keys follow:<br>
0x59450001<br>
0x59450001<br>
0x59450001<br>
0x59450001<br>
0x59450002<br>
0x59450002<br>
0x59450002<br>
0x59450002<br>
<br>
and watchdog reboots it.. but.. turns out that it returns with
just 1 reservation key for 1 path (instead of 4). I was wondering
if that was because of the async nature of the combination:
systemd + open-iscsi + multipath-tools + pacemaker service
startup.<br>
<br>
Check:<br>
<br>
(k)rafaeldtinoco@clusterg01:~$ uptime<br>
12:58:22 up 0 min, 0 users, load average: 0.31, 0.09, 0.03<br>
<br>
(k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -r
/dev/mapper/volume01<br>
PR generation=0x2f, Reservation follows:<br>
Key = 0x59450001<br>
scope = LU_SCOPE, type = Write Exclusive, registrants only<br>
<br>
(k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -k
/dev/mapper/volume01<br>
PR generation=0x2f, 9 registered reservation keys follow:<br>
0x59450001<br>
0x59450001<br>
0x59450001<br>
0x59450001<br>
0x59450002<br>
0x59450002<br>
0x59450002<br>
0x59450002<br>
0x59450000<br>
<br>
After this ^ I have to run:<br>
<br>
(k)rafaeldtinoco@clusterg01:~$ sudo mpathpersist --out --register
--param-rk=0x59450000 /dev/mapper/volume01<br>
persistent reserve out: scsi status: Reservation Conflict<br>
PR out: command failed<br>
<br>
(k)rafaeldtinoco@clusterg01:~$ sudo fence_mpath -v -d
/dev/mapper/volume01 -n 59450000 -o on<br>
2020-06-01 12:59:46,388 INFO: Executing: /usr/sbin/mpathpersist -i
-k -d /dev/mapper/volume01<br>
<br>
To guarantee all reservations are correctly placed again, after
the fence was done:<br>
<br>
(k)rafaeldtinoco@clusterg03:~$ sudo mpathpersist --in -k
/dev/mapper/volume01<br>
PR generation=0x33, 12 registered reservation keys follow:<br>
0x59450001<br>
0x59450001<br>
0x59450001<br>
0x59450001<br>
0x59450002<br>
0x59450002<br>
0x59450002<br>
0x59450002<br>
0x59450000<br>
0x59450000<br>
0x59450000<br>
0x59450000<br>
<br>
I was wondering if "resource-agents-deps.target" being RequiredBy
in [Install] systemd section for open-iscsi.service and
multipath-tools.service, together with
"Before=resource-agents-deps.target" in [Unit] section, would be
enough but in this case I think it is not enough. <br>
<br>
Any idea why this happens ? Did the agent start with there was a
single path available to the disk when iscsi session was being
established and multipath-tools had scanned a single path only ? I
tend to think that, if this was the case, sometimes I would have 1
path, sometimes 2, etc.. and not a single path with reservations
all the time (missing 3 reservations). <br>
<br>
OR there is something else about the PERSIST RESERVATION I'm
missing from SBC-3/4. <br>
<br>
Any thoughts ?<br>
</tt>
</body>
</html>