[ClusterLabs] ocf:pacemaker:ping works strange

Andrei Borzenkov arvidjaar at gmail.com
Tue Dec 12 08:17:40 EST 2023


On Fri, Dec 8, 2023 at 5:44 PM Artem <tyomikh at gmail.com> wrote:
>
> Hello experts.
>
> I use pacemaker for a Lustre cluster. But for simplicity and exploration I use a Dummy resource. I didn't like how resource performed failover and failback. When I shut down VM with remote agent, pacemaker tries to restart it. According to pcs status it marks the resource (not RA) Online for some time while VM stays down.
>
> OK, I wanted to improve its behavior and set up a ping monitor. I tuned the scores like this:
> pcs resource create FAKE3 ocf:pacemaker:Dummy
> pcs resource create FAKE4 ocf:pacemaker:Dummy
> pcs constraint location FAKE3 prefers lustre3=100
> pcs constraint location FAKE3 prefers lustre4=90
> pcs constraint location FAKE4 prefers lustre3=90
> pcs constraint location FAKE4 prefers lustre4=100
> pcs resource defaults update resource-stickiness=110
> pcs resource create ping ocf:pacemaker:ping dampen=5s host_list=local op monitor interval=3s timeout=7s clone meta target-role="started"
> for i in lustre{1..4}; do pcs constraint location ping-clone prefers $i; done
> pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined pingd
> pcs constraint location FAKE4 rule score=0 pingd lt 1 or not_defined pingd
> pcs constraint location FAKE3 rule score=125 pingd gt 0 or defined pingd
> pcs constraint location FAKE4 rule score=125 pingd gt 0 or defined pingd
>

These rules are contradicting. You set the score to 125 if pingd is
defined and at the same time set it to 0 if the score is less than 1.
To be "less than 1" it must be defined to start with so both rules
will always apply. I do not know how the rules are ordered. Either you
get random behavior, or one pair of these rules is effectively
ignored.

>
> Question #1) Why I cannot see accumulated score from pingd in crm_simulate output? Only location score and stickiness.
> pcmk__primitive_assign: FAKE3 allocation score on lustre3: 210
> pcmk__primitive_assign: FAKE3 allocation score on lustre4: 90
> pcmk__primitive_assign: FAKE4 allocation score on lustre3: 90
> pcmk__primitive_assign: FAKE4 allocation score on lustre4: 210
> Either when all is OK or when VM is down - score from pingd not added to total score of RA
>
>
> Question #2) I shut lustre3 VM down and leave it like that. pcs status:
>   * FAKE3       (ocf::pacemaker:Dummy):  Stopped
>   * FAKE4       (ocf::pacemaker:Dummy):  Started lustre4
>   * Clone Set: ping-clone [ping]:
>     * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2 lustre4 ] << lustre3 missing
> OK for now
> VM boots up. pcs status:
>   * FAKE3       (ocf::pacemaker:Dummy):  FAILED (blocked) [ lustre3 lustre4 ]  << what is it?
>   * Clone Set: ping-clone [ping]:
>     * ping      (ocf::pacemaker:ping):   FAILED lustre3 (blocked)    << why not started?
>     * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2 lustre4 ]

If this is full pcs status output, I miss stonith resource.

> I checked server processes manually and found that lustre4 runs "/usr/lib/ocf/resource.d/pacemaker/ping monitor" while lustre3 doesn't
> All is according to documentation but results are strange.
> Then I tried to add meta target-role="started" to pcs resource create ping and this time ping started after node rebooted. Can I expect that it was just missing from official setup documentation, and now everything will work fine?
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/


More information about the Users mailing list