[ClusterLabs] fence_mpath in latest fence-agents: single reservation after fence

Strahil Nikolov hunter86_bg at yahoo.com
Mon Jun 1 15:54:41 EDT 2020


I  don't see the reservation key in multipath.conf .
Have you set it up in unique way (each host has it's own key)?

Best Regards,
Strahil Nikolov

На 1 юни 2020 г. 16:04:32 GMT+03:00, Rafael David Tinoco <rafaeldtinoco at ubuntu.com> написа:
>Hello again,
>
>Long time I don't show up... I was finishing up details of Ubuntu 20.04
>HA packages (with lots of other stuff), so sorry for not being active
>until now (about to change). During my regression lab preparation, as I
>spoke in latest HA conf, I'm facing a situation I'd like to have some
>inputs on if anyone has...
>
>I'm clearing up needed fence_mpath/fence_iscsi setup for all Ubuntu
>versions:
>
>https://bugs.launchpad.net/ubuntu/+source/fence-agents/+bug/1864404
>
>and I just faced this:
>
>- 3 x node cluster setup
>- 3 x nodes share 4 paths to /dev/mapper/volume{00..10}
>- Using /dev/mapper/volume01 for fencing tests
>- softdog configured for /dev/watchdog
>- fence_mpath_check installed in /etc/watchdog.d/
>
>----
>
>(k)rafaeldtinoco at clusterg01:~$ crm configure show
>node 1: clusterg01
>node 2: clusterg02
>node 3: clusterg03
>primitive fence-mpath-clusterg01 stonith:fence_mpath \
>    params pcmk_on_timeout=70 pcmk_off_timeout=70
>pcmk_host_list=clusterg01 pcmk_monitor_action=metadata
>pcmk_reboot_action=off key=59450000 devices="/dev/mapper/volume01"
>power_wait=65 \
>    meta provides=unfencing target-role=Started
>primitive fence-mpath-clusterg02 stonith:fence_mpath \
>    params pcmk_on_timeout=70 pcmk_off_timeout=70
>pcmk_host_list=clusterg02 pcmk_monitor_action=metadata
>pcmk_reboot_action=off key=59450001 devices="/dev/mapper/volume01"
>power_wait=65 \
>    meta provides=unfencing target-role=Started
>primitive fence-mpath-clusterg03 stonith:fence_mpath \
>    params pcmk_on_timeout=70 pcmk_off_timeout=70
>pcmk_host_list=clusterg03 pcmk_monitor_action=metadata
>pcmk_reboot_action=off key=59450002 devices="/dev/mapper/volume01"
>power_wait=65 \
>    meta provides=unfencing target-role=Started
>property cib-bootstrap-options: \
>    have-watchdog=false \
>    dc-version=2.0.3-4b1f869f0f \
>    cluster-infrastructure=corosync \
>    cluster-name=clusterg \
>    stonith-enabled=true \
>    no-quorum-policy=stop \
>    last-lrm-refresh=1590773755
>
>----
>
>(k)rafaeldtinoco at clusterg03:~$ crm status
>Cluster Summary:
>  * Stack: corosync
>  * Current DC: clusterg02 (version 2.0.3-4b1f869f0f) - partition with
>quorum
>  * Last updated: Mon Jun  1 12:55:13 2020
>  * Last change:  Mon Jun  1 04:35:07 2020 by root via cibadmin on
>clusterg03
>  * 3 nodes configured
>  * 3 resource instances configured
>
>Node List:
>  * Online: [ clusterg01 clusterg02 clusterg03 ]
>
>Full List of Resources:
>  * fence-mpath-clusterg01    (stonith:fence_mpath):     Started
>clusterg02
>  * fence-mpath-clusterg02    (stonith:fence_mpath):     Started
>clusterg03
>  * fence-mpath-clusterg03    (stonith:fence_mpath):     Started
>clusterg01
>
>----
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -r
>/dev/mapper/volume01
>  PR generation=0x2d, Reservation follows:
>   Key = 0x59450001
>  scope = LU_SCOPE, type = Write Exclusive, registrants only
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -k
>/dev/mapper/volume01
>  PR generation=0x2d,     12 registered reservation keys follow:
>    0x59450001
>    0x59450001
>    0x59450001
>    0x59450001
>    0x59450002
>    0x59450002
>    0x59450002
>    0x59450002
>    0x59450000
>    0x59450000
>    0x59450000
>    0x59450000
>
>----
>
>You can see that everything looks fine. If I disable the 2
>interconnects
>I have for corosync:
>
>(k)rafaeldtinoco at clusterg01:~$ sudo corosync-quorumtool -a
>Quorum information
>------------------
>Date:             Mon Jun  1 12:56:00 2020
>Quorum provider:  corosync_votequorum
>Nodes:            3
>Node ID:          1
>Ring ID:          1.120
>Quorate:          Yes
>
>Votequorum information
>----------------------
>Expected votes:   3
>Highest expected: 3
>Total votes:      3
>Quorum:           2 
>Flags:            Quorate
>
>Membership information
>----------------------
>    Nodeid      Votes Name
>         1          1 clusterg01, clusterg01bkp (local)
>         2          1 clusterg02, clusterg02bkp
>         3          1 clusterg03, clusterg03bkp
>
>for node clusterg01 I have it fenced correctly:
>
>Pending Fencing Actions:
>  * reboot of clusterg01 pending: client=pacemaker-controld.906,
>origin=clusterg02
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -r
>/dev/mapper/volume01
>  PR generation=0x2e, Reservation follows:
>   Key = 0x59450001
>  scope = LU_SCOPE, type = Write Exclusive, registrants only
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -k
>/dev/mapper/volume01
>  PR generation=0x2e,     8 registered reservation keys follow:
>    0x59450001
>    0x59450001
>    0x59450001
>    0x59450001
>    0x59450002
>    0x59450002
>    0x59450002
>    0x59450002
>
>and watchdog reboots  it.. but.. turns out that it returns with just 1
>reservation key for 1 path (instead of 4). I was wondering if that was
>because of the async nature of the combination: systemd + open-iscsi +
>multipath-tools + pacemaker service startup.
>
>Check:
>
>(k)rafaeldtinoco at clusterg01:~$ uptime
> 12:58:22 up 0 min,  0 users,  load average: 0.31, 0.09, 0.03
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -r
>/dev/mapper/volume01
>  PR generation=0x2f, Reservation follows:
>   Key = 0x59450001
>  scope = LU_SCOPE, type = Write Exclusive, registrants only
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -k
>/dev/mapper/volume01
>  PR generation=0x2f,     9 registered reservation keys follow:
>    0x59450001
>    0x59450001
>    0x59450001
>    0x59450001
>    0x59450002
>    0x59450002
>    0x59450002
>    0x59450002
>    0x59450000
>
>After this ^ I have to run:
>
>(k)rafaeldtinoco at clusterg01:~$ sudo mpathpersist --out --register
>--param-rk=0x59450000 /dev/mapper/volume01
>persistent reserve out: scsi status: Reservation Conflict
>PR out: command failed
>
>(k)rafaeldtinoco at clusterg01:~$ sudo fence_mpath -v -d
>/dev/mapper/volume01 -n 59450000 -o on
>2020-06-01 12:59:46,388 INFO: Executing: /usr/sbin/mpathpersist -i -k
>-d
>/dev/mapper/volume01
>
>To guarantee all reservations are correctly placed again, after the
>fence was done:
>
>(k)rafaeldtinoco at clusterg03:~$ sudo mpathpersist --in -k
>/dev/mapper/volume01
>  PR generation=0x33,     12 registered reservation keys follow:
>    0x59450001
>    0x59450001
>    0x59450001
>    0x59450001
>    0x59450002
>    0x59450002
>    0x59450002
>    0x59450002
>    0x59450000
>    0x59450000
>    0x59450000
>    0x59450000
>
>I was wondering if "resource-agents-deps.target" being RequiredBy in
>[Install] systemd section for open-iscsi.service and
>multipath-tools.service, together with
>"Before=resource-agents-deps.target" in [Unit] section, would be enough
>but in this case I think it is not enough.
>
>Any idea why this happens ? Did the agent start with there was a single
>path available to the disk when iscsi session was being established and
>multipath-tools had scanned a single path only ? I tend to think that,
>if this was the case, sometimes I would have 1 path, sometimes 2, etc..
>and not a single path with reservations all the time (missing 3
>reservations).
>
>OR there is something else about the PERSIST RESERVATION I'm missing
>from SBC-3/4.
>
>Any thoughts ?


More information about the Users mailing list