<div dir="ltr"><div>Hello experts.</div><div><br></div><div>I use pacemaker for a Lustre cluster. But for simplicity and exploration I use a Dummy resource. I didn't like how resource performed failover and failback. When I shut down VM with remote agent, pacemaker tries to restart it. According to pcs status it marks the resource (not RA) Online for some time while VM stays down. </div><div><br></div><div>OK, I wanted to improve its behavior and set up a ping monitor. I tuned the scores like this:</div><div>pcs resource create FAKE3 ocf:pacemaker:Dummy<br>pcs resource create FAKE4 ocf:pacemaker:Dummy<br>pcs constraint location FAKE3 prefers lustre3=100<br>pcs constraint location FAKE3 prefers lustre4=90<br>pcs constraint location FAKE4 prefers lustre3=90<br>pcs constraint location FAKE4 prefers lustre4=100</div><div>pcs resource defaults update resource-stickiness=110</div><div>pcs resource create ping ocf:pacemaker:ping dampen=5s host_list=local op monitor interval=3s timeout=7s clone meta target-role="started"</div><div>for i in lustre{1..4}; do pcs constraint location ping-clone prefers $i; done</div><div>pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined pingd<br>pcs constraint location FAKE4 rule score=0 pingd lt 1 or not_defined pingd<br>pcs constraint location FAKE3 rule score=125 pingd gt 0 or defined pingd<br>pcs constraint location FAKE4 rule score=125 pingd gt 0 or defined pingd</div><div><br></div><div><br></div><div>Question #1) Why I cannot see accumulated score from pingd in crm_simulate output? Only location score and stickiness. <br></div><div>pcmk__primitive_assign: FAKE3 allocation score on lustre3: 210<br>pcmk__primitive_assign: FAKE3 allocation score on lustre4: 90</div><div>pcmk__primitive_assign: FAKE4 allocation score on lustre3: 90<br>pcmk__primitive_assign: FAKE4 allocation score on lustre4: 210</div><div>
Either when all is OK or when VM is down - score from pingd not added to total score of RA<br></div><div><br></div><div><br></div><div>Question #2) I shut lustre3 VM down and leave it like that. pcs status:</div><div> * FAKE3 (ocf::pacemaker:Dummy): Stopped<br> * FAKE4 (ocf::pacemaker:Dummy): Started lustre4<br> * Clone Set: ping-clone [ping]:<br> * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2 lustre4 ] << lustre3 missing</div><div>OK for now<br></div><div>VM boots up.
pcs status: <br></div><div> * FAKE3 (ocf::pacemaker:Dummy): FAILED (blocked) [ lustre3 lustre4 ] << what is it?<br> * Clone Set: ping-clone [ping]:<br> * ping (ocf::pacemaker:ping): FAILED lustre3 (blocked) << why not started?<br> * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2 lustre4 ]</div><div>I checked server processes manually and found that lustre4 runs "/usr/lib/ocf/resource.d/pacemaker/ping monitor" while lustre3 doesn't<br></div><div>All is according to documentation but results are strange.<br></div><div>Then I tried to add meta target-role="started" to
pcs resource create ping and this time ping started after node rebooted. Can I expect that it was just missing from official setup documentation, and now everything will work fine?<br></div></div>