[Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

Wed May 15 14:44:50 EDT 2013

Sorry to bring up old issues but I am having the exact same problem as the original poster. A simultaneous disconnect on my two node cluster causes the resources to start to transition to the other node but mid flight the transition is aborted and resources are started again on the original node when the cluster realizes connectivity is same between the two nodes.

I have tried various dampen settings without having any luck. Seems like the nodes report the outages at slightly different times which results in a partial transition of resources instead of waiting to know the connectivity of all of the nodes in the cluster before taking action which is what I would have thought dampen would help solve.

Ideally the cluster wouldn't start the transition if another cluster node is having a connectivity issue as well and connectivity status is shared between all cluster nodes. Find my configuration below. Let me know there is something I can change to fix or if this behavior is expected.

primitive p_drbd ocf:linbit:drbd \
        params drbd_resource="r1" \
        op monitor interval="30s" role="Slave" \
        op monitor interval="10s" role="Master"
primitive p_fs ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/r1" directory="/drbd/r1" fstype="ext4" options="noatime" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="180s" \
        op monitor interval="30s" timeout="40s"
primitive p_mysql ocf:heartbeat:mysql \
        params binary="/usr/libexec/mysqld" config="/drbd/r1/mysql/my.cnf" datadir="/drbd/r1/mysql" \
        op start interval="0" timeout="120s" \
        op stop interval="0" timeout="120s" \
        op monitor interval="30s" \
        meta target-role="Started"
primitive p_ping ocf:pacemaker:ping \
        params host_list="192.168.5.1" dampen="30s" multiplier="1000" debug="true" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="60s" \
        op monitor interval="5s" timeout="10s"
group g_mysql_group p_fs p_mysql \
        meta target-role="Started"
ms ms_drbd p_drbd \
        meta notify="true" master-max="1" clone-max="2" target-role="Started"
clone cl_ping p_ping
location l_connected g_mysql \
        rule $id="l_connected-rule" pingd: defined pingd
colocation c_mysql_on_drbd inf: g_mysql ms_drbd:Master
order o_drbd_before_mysql inf: ms_drbd:promote g_mysql:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.6-1.el6-8b6c6b9b6dc2627713f870850d20163fad4cc2a2" \
        cluster-infrastructure="Heartbeat" \
        no-quorum-policy="ignore" \
        stonith-enabled="false" \
        cluster-recheck-interval="5m" \
        last-lrm-refresh="1368632470"
rsc_defaults $id="rsc-options" \
        migration-threshold="5" \
        resource-stickiness="200"