[ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help
Ken Gaillot
kgaillot at redhat.com
Mon Dec 18 16:13:04 EST 2023
On Mon, 2023-12-18 at 23:39 +0300, Artem wrote:
> Hello experts.
>
> I previously played with a dummy resource and it worked as expected.
> Now I'm switching to a Lustre OST resource and cannot make it.
> Neither can I understand.
>
>
> ### Initial setup:
> # pcs resource defaults update resource-stickness=110
> # for i in {1..4}; do pcs cluster node add-remote lustre$i
> reconnect_interval=60; done
> # for i in {1..4}; do pcs constraint location lustre$i prefers
> lustre-mgs lustre-mds1 lustre-mds2; done
> # pcs resource create OST3 ocf:lustre:Lustre target=/dev/disk/by-
> id/wwn-0x6000c291b7f7147f826bb95153e2eaca mountpoint=/lustre/oss3
> # pcs resource create OST4 ocf:lustre:Lustre target=/dev/disk/by-
> id/wwn-0x6000c292c41eaae60bccdd3a752913b3 mountpoint=/lustre/oss4
> (I also tried ocf:heartbeat:Filesystem device=... directory=...
> fstype=lustre force_unmount=safe --> same behavior)
>
> # pcs constraint location OST3 prefers lustre3=100
> # pcs constraint location OST3 prefers lustre4=100
> # pcs constraint location OST4 prefers lustre3=100
> # pcs constraint location OST4 prefers lustre4=100
> # for i in lustre-mgs lustre-mds1 lustre-mds2 lustre{1..2}; do pcs
> constraint location OST3 avoids $i; done
> # for i in lustre-mgs lustre-mds1 lustre-mds2 lustre{1..2}; do pcs
> constraint location OST4 avoids $i; done
>
> ### Checking all is good
> # crm_simulate --simulate --live-check --show-scores
> pcmk__primitive_assign: OST4 allocation score on lustre3: 100
> pcmk__primitive_assign: OST4 allocation score on lustre4: 210
> # pcs status
> * OST3 (ocf::lustre:Lustre): Started lustre3
> * OST4 (ocf::lustre:Lustre): Started lustre4
>
> ### VM with lustre4 (OST4) is OFF
>
> # crm_simulate --simulate --live-check --show-scores
> pcmk__primitive_assign: OST4 allocation score on lustre3: 100
> pcmk__primitive_assign: OST4 allocation score on lustre4: 100
> Start OST4 ( lustre3 )
> Resource action: OST4 start on lustre3
> Resource action: OST4 monitor=20000 on lustre3
> # pcs status
> * OST3 (ocf::lustre:Lustre): Started lustre3
> * OST4 (ocf::lustre:Lustre): Stopped
>
> 1) I see crm_simulate guesed that it has to restart failed OST4 on
> lustre3. After making such decision I suspect it evaluates 100:100
> scores of both lustre3 and lustre4, but lustre3 is already running a
> service. So it decides to run OST4 again on lustre4, which is failed.
> Thus it cannot restart on surviving nodes. Right?
No. I'd start with figuring out this case. There's no reason, given the
configuration above, why OST4 would be stopped. In fact the simulation
shows it should be started, so that suggests that maybe the actual
start failed.
Do the logs show any errors around this time?
> 2) Ok, let's try not to give specific score - nothing changed, see
> below:
> ### did remove old constraints; clear all resources; cleanup all
> resources; cluster stop; cluster start
>
> # pcs constraint location OST3 prefers lustre3 lustre4
> # pcs constraint location OST4 prefers lustre3 lustre4
> # for i in lustre-mgs lustre-mds1 lustre-mds2 lustre{1..2}; do pcs
> constraint location OST3 avoids $i; done
> # for i in lustre-mgs lustre-mds1 lustre-mds2 lustre{1..2}; do pcs
> constraint location OST4 avoids $i; done
> # crm_simulate --simulate --live-check --show-scores
> pcmk__primitive_assign: OST4 allocation score on lustre3: INFINITY
> pcmk__primitive_assign: OST4 allocation score on lustre4: INFINITY
> # pcs status
> * OST3 (ocf::lustre:Lustre): Started lustre3
> * OST4 (ocf::lustre:Lustre): Started lustre4
>
> ### VM with lustre4 (OST4) is OFF
>
> # crm_simulate --simulate --live-check --show-scores
> pcmk__primitive_assign: OST4 allocation score on lustre3: INFINITY
> pcmk__primitive_assign: OST4 allocation score on lustre4: INFINITY
> Start OST4 ( lustre3 )
> Resource action: OST4 start on lustre3
> Resource action: OST4 monitor=20000 on lustre3
> # pcs status
> * OST3 (ocf::lustre:Lustre): Started lustre3
> * OST4 (ocf::lustre:Lustre): Stopped
>
> 3) Ok lets try to set different scores with preference to nodes and
> affect it with pingd:
> ### did remove old constraints; clear all resources; cleanup all
> resources; cluster stop; cluster start
>
> # pcs constraint location OST3 prefers lustre3=100
> # pcs constraint location OST3 prefers lustre4=90
> # pcs constraint location OST4 prefers lustre3=90
> # pcs constraint location OST4 prefers lustre4=100
> # for i in lustre-mgs lustre-mds1 lustre-mds2 lustre{1..2}; do pcs
> constraint location OST3 avoids $i; done
> # for i in lustre-mgs lustre-mds1 lustre-mds2 lustre{1..2}; do pcs
> constraint location OST4 avoids $i; done
> # pcs resource create ping ocf:pacemaker:ping dampen=5s
> host_list=192.168.34.250 op monitor interval=3s timeout=7s meta
> target-role="started" globally-unique="false" clone
> # for i in lustre-mgs lustre-mds{1..2} lustre{1..4}; do pcs
> constraint location ping-clone prefers $i; done
> # pcs constraint location OST3 rule score=0 pingd lt 1 or not_defined
> pingd
> # pcs constraint location OST4 rule score=0 pingd lt 1 or not_defined
> pingd
> # pcs constraint location OST3 rule score=125 defined pingd
> # pcs constraint location OST4 rule score=125 defined pingd
>
> ### same home base:
> # crm_simulate --simulate --live-check --show-scores
> pcmk__primitive_assign: OST4 allocation score on lustre3: 90
> pcmk__primitive_assign: OST4 allocation score on lustre4: 210
> # pcs status
> * OST3 (ocf::lustre:Lustre): Started lustre3
> * OST4 (ocf::lustre:Lustre): Started lustre4
>
> ### VM with lustre4 (OST4) is OFF.
>
> # crm_simulate --simulate --live-check --show-scores
> pcmk__primitive_assign: OST4 allocation score on lustre3: 90
> pcmk__primitive_assign: OST4 allocation score on lustre4: 100
> Start OST4 ( lustre3 )
> Resource action: OST4 start on lustre3
> Resource action: OST4 monitor=20000 on lustre3
> # pcs status
> * OST3 (ocf::lustre:Lustre): Started lustre3
> * OST4 (ocf::lustre:Lustre): Stopped
>
> Again lustre3 seems unable to overrule due to lower score and pingd
> DOESN'T help at all!
>
>
> 4) Can I make a reliable HA failover without pingd to keep things as
> simple as possible?
> 5) Pings might help to affect cluster decisions in case GW is lost,
> but its not working as all the guides say. Why?
>
>
> Thanks in advance,
> Artem
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list