[Pacemaker] 2 node cluster enters deathmatch. Please Help!

Димитър Бойн dboyn at postpath.com
Sun Jul 26 04:20:50 EDT 2009


Hi,
I have a fairly simple two nodes cluster with one "lsb" resource and "external/ipmi" based "stonith"s
 
In attempt to test STONITH I did "#killall aisexec" on one of them and as expected it was immediately shot.
However when on reboot I started the cluster software again it shot the survived node and this "revenge" game seems to go for ever :-(.
I noticed that both nodes keep different version of the ring counter and was wondering if there is a command to tell a failed server that it has been dead for a while and should not pull the trigger but rather persist synchronizing.
 
Please, help!
Am I missing something in my cluster config? Is there a proper procedure to bring failed nodes back into the cluster?
 
Thanks!
P.S. here is my config:
 
node ppst1pru001
node ppst1pru01
primitive pprus lsb:pprus \
        op monitor interval= 10s  timeout= 90s  \
        op start interval= 0  timeout= 25s  \
        op stop interval= 0  timeout= 25s  \
        meta target-role= Started 
primitive ppst1pru001-stonith stonith:external/ipmi \
        params hostname= ppst1pru001  ipaddr= 10.252.250.29  userid= root  passwd= 1q2w3e$  interface= lan 
primitive ppst1pru01-stonith stonith:external/ipmi \
        params hostname= ppst1pru01  ipaddr= 10.252.250.38  userid= root  passwd= 1q2w3e$  interface= lan 
location dont-run-ppst1pru001-stonith-on-the-target ppst1pru001-stonith \
        rule $id= dont-run-ppst1pru001-stonith  -inf: #uname eq ppst1pru001
location dont-run-ppst1pru01-stonith-on-the-target ppst1pru01-stonith \
        rule $id= dont-run-ppst1pru01-stonith  -inf: #uname eq ppst1pru01
property $id= cib-bootstrap-options  \
        dc-version= 1.0.4-6dede86d6105786af3a5321ccf66b44b6914f0aa  \
        cluster-infrastructure= openais  \
        expected-quorum-votes= 2  \
        symmetric-cluster= true  \
        no-quorum-policy= ignore  \
        stonith-enabled= true  \
        default-resource-stickiness= INFINITY  \
        last-lrm-refresh= 1248483902 
rsc_defaults $id= rsc_defaults-options  \
        on-fail= restart 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.clusterlabs.org/pipermail/pacemaker/attachments/20090726/54839acd/attachment-0001.html>


More information about the Pacemaker mailing list