[Pacemaker] understanding resource restarts through pengine

Thu Sep 22 22:35:57 EDT 2011

On Tue, Sep 20, 2011 at 10:10 PM, Oualid Nouri <o.nouri at computer-lan.de> wrote:
> Hi,
>
> I’m testing pacemaker resource failover  in a very simple test environment
> with two virtual machines.
>
> 3 Cloned resources (drbd dualprimary), controld, clvm.
>
> Fencing with external/ssh that’s it.
>
> I‘m having problems understanding why my clvm resource gets restarted when
> a  failing node gets back online.
>
>
>
> When one node is powerd off (failtest) the remaining node fences the
> “failing” node and the clvm-resource stays online.
>
> But when the failed node is back online the clvm resource clone on the
> previously ”remaining “ node gets restarted without visible reason (see
> logs)

Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_rsc_op:
Operation res_drbd_1:1_monitor_0 found resource res_drbd_1:1 active on
tnode1

When tnode1 came back online, the cluster found that drbd was already running.
Do you have it configured to start at boot time?

>
>
>
> I gues doing something wrong!
>
> But what?
>
> Anyone who can point me in the right direction?
>
>
>
>
>
> Thank you!
>
>
>
>
>
>
>
> Sep 20 13:18:41 tnode2 crmd: [3121]: info: do_pe_invoke: Query 228:
> Requesting the current CIB: S_POLICY_ENGINE
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_config: On loss of
> CCM Quorum: Ignore
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_rsc_op: Operation
> res_drbd_1:1_monitor_0 found resource res_drbd_1:1 active on tnode1
>
> Sep 20 13:18:41 tnode2 crmd: [3121]: info: do_pe_invoke_callback: Invoking
> the PE: query=228, ref=pe_calc-dc-1316517521-176, seq=1268, quorate=1
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_rsc_op: Operation
> res_drbd_1:0_monitor_0 found resource res_drbd_1:0 active on tnode2
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print:  Master/Slave
> Set: ms_drbd_1 [res_drbd_1]
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print:      Masters: [
> tnode2 ]
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print:      Slaves: [
> tnode1 ]
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print:  Clone Set:
> cl_controld_1 [res_controld_dlm]
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print:      Started: [
> tnode2 ]
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print:      Stopped: [
> res_controld_dlm:1 ]
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: native_print:
> stonith_external_ssh_1#011(stonith:external/ssh):#011Started tnode1
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: native_print:
> stonith_external_ssh_2#011(stonith:external/ssh):#011Started tnode2
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print:  Clone Set:
> cl_clvmd_1 [res_clvmd_clustervg]
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print:      Started: [
> tnode2 ]
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print:      Stopped: [
> res_clvmd_clustervg:1 ]
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: RecurringOp:  Start
> recurring monitor (60s) for res_controld_dlm:1 on tnode1
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave
> res_drbd_1:0#011(Master tnode2)
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Promote
> res_drbd_1:1#011(Slave -> Master tnode1)
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave
> res_controld_dlm:0#011(Started tnode2)
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Start
> res_controld_dlm:1#011(tnode1)
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave
> stonith_external_ssh_1#011(Started tnode1)
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave
> stonith_external_ssh_2#011(Started tnode2)
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Restart
> res_clvmd_clustervg:0#011(Started tnode2)
>
> Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Start
> res_clvmd_clustervg:1#011(tnode1)
>
>
>
> CONFIG
>
>
>
> node tnode1 \
>
>         attributes standby="off"
>
> node tnode2 \
>
>         attributes standby="off"
>
> primitive res_clvmd_clustervg ocf:lvm2:clvmd \
>
>         params daemon_timeout="30" \
>
>         operations $id="res_clvmd_clustervg-operations" \
>
>         op monitor interval="0" timeout="4min" start-delay="5"
>
> primitive res_controld_dlm ocf:pacemaker:controld \
>
>         operations $id="res_controld_dlm-operations" \
>
>         op monitor interval="60" timeout="60" start-delay="0" \
>
>         meta target-role="started"
>
> primitive res_drbd_1 ocf:linbit:drbd \
>
>         params drbd_resource="r0" \
>
>         operations $id="res_drbd_1-operations" \
>
>         op start interval="0" timeout="240" \
>
>         op promote interval="0" timeout="90" \
>
>         op demote interval="0" timeout="90" \
>
>         op stop interval="0" timeout="100" \
>
>         op monitor interval="10" timeout="20" start-delay="1min" \
>
>         op notify interval="0" timeout="90" \
>
>         meta target-role="started" is-managed="true"
>
> primitive stonith_external_ssh_1 stonith:external/ssh \
>
>         params hostlist="tnode2" \
>
>         operations $id="stonith_external_ssh_1-operations" \
>
>         op start interval="0" timeout="60" \
>
>         op stop interval="0" timeout="60" \
>
>         op monitor interval="60" timeout="60" start-delay="0" \
>
>         meta failure-timeout="3"
>
> primitive stonith_external_ssh_2 stonith:external/ssh \
>
>         params hostlist="tnode1" \
>
>         operations $id="stonith_external_ssh_2-operations" \
>
>         op start interval="0" timeout="60" \
>
>         op stop interval="0" timeout="60" \
>
>         op monitor interval="60" timeout="60" start-delay="0" \
>
>         meta target-role="started" failure-timeout="3"
>
> ms ms_drbd_1 res_drbd_1 \
>
>         meta master-max="2" clone-max="2" notify="true" ordered="true"
> interleave="true"
>
> clone cl_clvmd_1 res_clvmd_clustervg \
>
>         meta clone-max="2" notify="true"
>
> clone cl_controld_1 res_controld_dlm \
>
>         meta clone-max="2" notify="true" ordered="true" interleave="true"
>
> location loc_ms_drbd_1-ping-prefer ms_drbd_1 \
>
>         rule $id="loc_ms_drbd_1-ping-prefer-rule" pingd: defined pingd
>
> location loc_stonith_external_ssh_1_tnode2 stonith_external_ssh_1 -inf:
> tnode2
>
> location loc_stonith_external_ssh_2_tnode1 stonith_external_ssh_2 -inf:
> tnode1
>
> colocation col_cl_controld_1_cl_clvmd_1 inf: cl_clvmd_1 cl_controld_1
>
> colocation col_ms_drbd_1_cl_controld_1 inf: cl_controld_1 ms_drbd_1:Master
>
> order ord_cl_controld_1_cl_clvmd_1 inf: cl_controld_1 cl_clvmd_1
>
> order ord_ms_drbd_1_cl_controld_1 inf: ms_drbd_1:promote cl_controld_1:start
>
> property $id="cib-bootstrap-options" \
>
>         expected-quorum-votes="2" \
>
>         stonith-timeout="30" \
>
>         dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \
>
>         no-quorum-policy="ignore" \
>
>         cluster-infrastructure="openais" \
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>