[ClusterLabs] fence_mpath in latest fence-agents: single reservation after fence
Strahil Nikolov
hunter86_bg at yahoo.com
Mon Jun 1 15:54:41 EDT 2020
I don't see the reservation key in multipath.conf .
Have you set it up in unique way (each host has it's own key)?
Best Regards,
Strahil Nikolov
На 1 юни 2020 г. 16:04:32 GMT+03:00, Rafael David Tinoco <rafaeldtinoco at ubuntu.com> написа:
>Hello again,
>
>Long time I don't show up... I was finishing up details of Ubuntu 20.04
>HA packages (with lots of other stuff), so sorry for not being active
>until now (about to change). During my regression lab preparation, as I
>spoke in latest HA conf, I'm facing a situation I'd like to have some
>inputs on if anyone has...
>
>I'm clearing up needed fence_mpath/fence_iscsi setup for all Ubuntu
>versions:
>
>https://bugs.launchpad.net/ubuntu/+source/fence-agents/+bug/1864404
>
>and I just faced this:
>
>- 3 x node cluster setup
>- 3 x nodes share 4 paths to /dev/mapper/volume{00..10}
>- Using /dev/mapper/volume01 for fencing tests
>- softdog configured for /dev/watchdog
>- fence_mpath_check installed in /etc/watchdog.d/
>
>----
>
>(k)rafaeldtinoco at clusterg01:~$ crm configure show
>node 1: clusterg01
>node 2: clusterg02
>node 3: clusterg03
>primitive fence-mpath-clusterg01 stonith:fence_mpath \
> params pcmk_on_timeout=70 pcmk_off_timeout=70
>pcmk_host_list=clusterg01 pcmk_monitor_action=metadata
>pcmk_reboot_action=off key=59450000 devices="/dev/mapper/volume01"
>power_wait=65 \
> meta provides=unfencing target-role=Started
>primitive fence-mpath-clusterg02 stonith:fence_mpath \
> params pcmk_on_timeout=70 pcmk_off_timeout=70
>pcmk_host_list=clusterg02 pcmk_monitor_action=metadata
>pcmk_reboot_action=off key=59450001 devices="/dev/mapper/volume01"
>power_wait=65 \
> meta provides=unfencing target-role=Started
>primitive fence-mpath-clusterg03 stonith:fence_mpath \
> params pcmk_on_timeout=70 pcmk_off_timeout=70
>pcmk_host_list=clusterg03 pcmk_monitor_action=metadata
>pcmk_reboot_action=off key=59450002 devices="/dev/mapper/volume01"
>power_wait=65 \
> meta provides=unfencing target-role=Started
>property cib-bootstrap-options: \
> have-watchdog=false \
> dc-version=2.0.3-4b1f869f0f \
> cluster-infrastructure=corosync \
> cluster-name=clusterg \
> stonith-enabled=true \
> no-quorum-policy=stop \
> last-lrm-refresh=1590773755
>
>----
>
>(k)rafaeldtinoco at clusterg03:~$ crm status
>Cluster Summary:
> * Stack: corosync
> * Current DC: clusterg02 (version 2.0.3-4b1f869f0f) - partition with
>quorum
> * Last updated: Mon Jun 1 12:55:13 2020
> * Last change: Mon Jun 1 04:35:07 2020 by root via cibadmin on
>clusterg03
> * 3 nodes configured
> * 3 resource instances configured
>
>Node List:
> * Online: [ clusterg01 clusterg02 clusterg03 ]
>
>Full List of Resources:
> * fence-mpath-clusterg01 (stonith:fence_mpath): Started
>clusterg02
> * fence-mpath-clusterg02 (stonith:fence_mpath): Started
>clusterg03
> * fence-mpath-clusterg03 (stonith:fence_mpath): Started
>clusterg01
>
>----
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -r
>/dev/mapper/volume01
> PR generation=0x2d, Reservation follows:
> Key = 0x59450001
> scope = LU_SCOPE, type = Write Exclusive, registrants only
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -k
>/dev/mapper/volume01
> PR generation=0x2d, 12 registered reservation keys follow:
> 0x59450001
> 0x59450001
> 0x59450001
> 0x59450001
> 0x59450002
> 0x59450002
> 0x59450002
> 0x59450002
> 0x59450000
> 0x59450000
> 0x59450000
> 0x59450000
>
>----
>
>You can see that everything looks fine. If I disable the 2
>interconnects
>I have for corosync:
>
>(k)rafaeldtinoco at clusterg01:~$ sudo corosync-quorumtool -a
>Quorum information
>------------------
>Date: Mon Jun 1 12:56:00 2020
>Quorum provider: corosync_votequorum
>Nodes: 3
>Node ID: 1
>Ring ID: 1.120
>Quorate: Yes
>
>Votequorum information
>----------------------
>Expected votes: 3
>Highest expected: 3
>Total votes: 3
>Quorum: 2
>Flags: Quorate
>
>Membership information
>----------------------
> Nodeid Votes Name
> 1 1 clusterg01, clusterg01bkp (local)
> 2 1 clusterg02, clusterg02bkp
> 3 1 clusterg03, clusterg03bkp
>
>for node clusterg01 I have it fenced correctly:
>
>Pending Fencing Actions:
> * reboot of clusterg01 pending: client=pacemaker-controld.906,
>origin=clusterg02
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -r
>/dev/mapper/volume01
> PR generation=0x2e, Reservation follows:
> Key = 0x59450001
> scope = LU_SCOPE, type = Write Exclusive, registrants only
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -k
>/dev/mapper/volume01
> PR generation=0x2e, 8 registered reservation keys follow:
> 0x59450001
> 0x59450001
> 0x59450001
> 0x59450001
> 0x59450002
> 0x59450002
> 0x59450002
> 0x59450002
>
>and watchdog reboots it.. but.. turns out that it returns with just 1
>reservation key for 1 path (instead of 4). I was wondering if that was
>because of the async nature of the combination: systemd + open-iscsi +
>multipath-tools + pacemaker service startup.
>
>Check:
>
>(k)rafaeldtinoco at clusterg01:~$ uptime
> 12:58:22 up 0 min, 0 users, load average: 0.31, 0.09, 0.03
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -r
>/dev/mapper/volume01
> PR generation=0x2f, Reservation follows:
> Key = 0x59450001
> scope = LU_SCOPE, type = Write Exclusive, registrants only
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -k
>/dev/mapper/volume01
> PR generation=0x2f, 9 registered reservation keys follow:
> 0x59450001
> 0x59450001
> 0x59450001
> 0x59450001
> 0x59450002
> 0x59450002
> 0x59450002
> 0x59450002
> 0x59450000
>
>After this ^ I have to run:
>
>(k)rafaeldtinoco at clusterg01:~$ sudo mpathpersist --out --register
>--param-rk=0x59450000 /dev/mapper/volume01
>persistent reserve out: scsi status: Reservation Conflict
>PR out: command failed
>
>(k)rafaeldtinoco at clusterg01:~$ sudo fence_mpath -v -d
>/dev/mapper/volume01 -n 59450000 -o on
>2020-06-01 12:59:46,388 INFO: Executing: /usr/sbin/mpathpersist -i -k
>-d
>/dev/mapper/volume01
>
>To guarantee all reservations are correctly placed again, after the
>fence was done:
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -k
>/dev/mapper/volume01
> PR generation=0x33, 12 registered reservation keys follow:
> 0x59450001
> 0x59450001
> 0x59450001
> 0x59450001
> 0x59450002
> 0x59450002
> 0x59450002
> 0x59450002
> 0x59450000
> 0x59450000
> 0x59450000
> 0x59450000
>
>I was wondering if "resource-agents-deps.target" being RequiredBy in
>[Install] systemd section for open-iscsi.service and
>multipath-tools.service, together with
>"Before=resource-agents-deps.target" in [Unit] section, would be enough
>but in this case I think it is not enough.
>
>Any idea why this happens ? Did the agent start with there was a single
>path available to the disk when iscsi session was being established and
>multipath-tools had scanned a single path only ? I tend to think that,
>if this was the case, sometimes I would have 1 path, sometimes 2, etc..
>and not a single path with reservations all the time (missing 3
>reservations).
>
>OR there is something else about the PERSIST RESERVATION I'm missing
>from SBC-3/4.
>
>Any thoughts ?
More information about the Users
mailing list