[ClusterLabs] ocf:pacemaker:ping works strange
Artem
tyomikh at gmail.com
Fri Dec 8 09:44:16 EST 2023
Hello experts.
I use pacemaker for a Lustre cluster. But for simplicity and exploration I
use a Dummy resource. I didn't like how resource performed failover and
failback. When I shut down VM with remote agent, pacemaker tries to restart
it. According to pcs status it marks the resource (not RA) Online for some
time while VM stays down.
OK, I wanted to improve its behavior and set up a ping monitor. I tuned the
scores like this:
pcs resource create FAKE3 ocf:pacemaker:Dummy
pcs resource create FAKE4 ocf:pacemaker:Dummy
pcs constraint location FAKE3 prefers lustre3=100
pcs constraint location FAKE3 prefers lustre4=90
pcs constraint location FAKE4 prefers lustre3=90
pcs constraint location FAKE4 prefers lustre4=100
pcs resource defaults update resource-stickiness=110
pcs resource create ping ocf:pacemaker:ping dampen=5s host_list=local op
monitor interval=3s timeout=7s clone meta target-role="started"
for i in lustre{1..4}; do pcs constraint location ping-clone prefers $i;
done
pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined pingd
pcs constraint location FAKE4 rule score=0 pingd lt 1 or not_defined pingd
pcs constraint location FAKE3 rule score=125 pingd gt 0 or defined pingd
pcs constraint location FAKE4 rule score=125 pingd gt 0 or defined pingd
Question #1) Why I cannot see accumulated score from pingd in crm_simulate
output? Only location score and stickiness.
pcmk__primitive_assign: FAKE3 allocation score on lustre3: 210
pcmk__primitive_assign: FAKE3 allocation score on lustre4: 90
pcmk__primitive_assign: FAKE4 allocation score on lustre3: 90
pcmk__primitive_assign: FAKE4 allocation score on lustre4: 210
Either when all is OK or when VM is down - score from pingd not added to
total score of RA
Question #2) I shut lustre3 VM down and leave it like that. pcs status:
* FAKE3 (ocf::pacemaker:Dummy): Stopped
* FAKE4 (ocf::pacemaker:Dummy): Started lustre4
* Clone Set: ping-clone [ping]:
* Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2 lustre4
] << lustre3 missing
OK for now
VM boots up. pcs status:
* FAKE3 (ocf::pacemaker:Dummy): FAILED (blocked) [ lustre3 lustre4
] << what is it?
* Clone Set: ping-clone [ping]:
* ping (ocf::pacemaker:ping): FAILED lustre3 (blocked) << why
not started?
* Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2 lustre4
]
I checked server processes manually and found that lustre4 runs
"/usr/lib/ocf/resource.d/pacemaker/ping monitor" while lustre3 doesn't
All is according to documentation but results are strange.
Then I tried to add meta target-role="started" to pcs resource create ping
and this time ping started after node rebooted. Can I expect that it was
just missing from official setup documentation, and now everything will
work fine?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20231208/23dbfa55/attachment.htm>
More information about the Users
mailing list