[Pacemaker] Monitor of LVM resources problem

Andrew Beekhof andrew at beekhof.net
Thu Aug 26 03:31:38 EDT 2010


On Tue, Aug 17, 2010 at 8:09 PM,  <Claude.Durocher at mcccf.gouv.qc.ca> wrote:
> I have a 3 node cluster running Xen resources on SLES11sp1 with HAE. The
> nodes are connected to a SAN and Pacemaker controls the start of the shared
> disk. From time to time, monitor of LVM volume groups or ocfs2 file system
> fails : this triggers a stopping of the shared disk resource but this can't
> be completed as Xen resources are running using the shared disk (I don't
> know why monitor fails as the resource seems to be running fine) :
>
> Log patterns:
> Aug 13 21:27:49 qcpvms09 crmd: [9677]: ERROR: process_lrm_event: LRM
> operation xen_configstore_volume1:1_monitor_120000 (32) Timed Out
> (timeout=50000ms)
> Aug 13 21:28:09 qcpvms09 crmd: [9677]: ERROR: process_lrm_event: LRM
> operation xen_configstore_volume1:1_stop_0 (55) Timed Out (timeout=20000ms)
> Aug 13 21:28:29 qcpvms09 crmd: [9677]: ERROR: process_lrm_event: LRM
> operation qcdtypo01_monitor_120000 (54) Timed Out (timeout=90000ms)
>
> Is there a way to have the monitor operation to retry x times before
> declaring the resource failed?

No

> Or should the monitor part of the LVM
> resource or OCFS2 resource be changed?

I'd start by increasing the timeouts.
If that doesn't work, you'll need to investigate the Filesystem agent
to see what is taking so long.

>
> My running config :
>
> node qcpvms07 \
> attributes standby="off"
> node qcpvms08 \
> attributes standby="off"
> node qcpvms09 \
> attributes standby="off"
> primitive clvm ocf:lvm2:clvmd \
> operations $id="clvm-operations" \
> op monitor interval="120" timeout="20" start-delay="10" \
> op start interval="0" timeout="30" \
> params daemon_timeout="30" daemon_options="-d0"
> primitive dlm ocf:pacemaker:controld \
> operations $id="dlm-operations" \
> op monitor interval="120" timeout="20" start-delay="10"
> primitive o2cb ocf:ocfs2:o2cb \
> operations $id="o2cb-operations" \
> op monitor interval="120" timeout="20" start-delay="10"
> primitive ping-net1 ocf:pacemaker:ping \
> operations $id="ping-net1-operations" \
> op monitor interval="120" timeout="20" on-fail="restart" start-delay="0" \
> params name="ping-net1" host_list="192.168.88.1 192.168.88.43" interval="15"
> timeout="5" attempts="5" \
> meta target-role="started"
> primitive qcddom01 ocf:heartbeat:Xen \
> meta target-role="started" \
> operations $id="qcddom01-operations" \
> op monitor interval="120" timeout="30" on-fail="restart" start-delay="60" \
> op start interval="0" timeout="120" start-delay="0" \
> op stop interval="0" timeout="120" \
> op migrate_from interval="0" timeout="240" \
> op migrate_to interval="0" timeout="240" \
> params xmfile="/etc/xen/vm/qcddom01" allow-migrate="true"
> primitive qcdtypo01 ocf:heartbeat:Xen \
> meta target-role="started" \
> operations $id="qcdtypo01-operations" \
> op monitor interval="120" timeout="30" on-fail="restart" start-delay="60" \
> op start interval="0" timeout="120" start-delay="0" \
> op stop interval="0" timeout="120" \
> op migrate_from interval="0" timeout="240" \
> op migrate_to interval="0" timeout="240" \
> params xmfile="/etc/xen/vm/qcdtypo01" allow-migrate="true"
> primitive stonith-sbd stonith:external/sbd \
> meta target-role="started" \
> operations $id="stonith-sbd-operations" \
> op monitor interval="30" timeout="15" start-delay="30" \
> params sbd_device="/dev/mapper/mpathc"
> primitive xen_configstore_volume1 ocf:heartbeat:Filesystem \
> operations $id="xen_configstore_volume1-operations" \
> op monitor interval="120" timeout="40" start-delay="10" \
> params device="/dev/xen_volume1_group/xen_configstore_volume1"
> directory="/etc/xen/vm" fstype="ocfs2"
> primitive xen_volume1_group ocf:heartbeat:LVM \
> operations $id="xen_volume1_group-operations" \
> op monitor interval="120" timeout="30" start-delay="10" \
> params volgrpname="xen_volume1_group"
> primitive xen_volume2_group ocf:heartbeat:LVM \
> operations $id="xen_volume2_group-operations" \
> op monitor interval="120" timeout="30" start-delay="10" \
> params volgrpname="xen_volume2_group"
> group shared-disk-group dlm clvm o2cb xen_volume1_group xen_volume2_group
> xen_configstore_volume1 \
> meta target-role="started"
> clone ping-clone ping-net1 \
> meta target-role="started" interleave="true" ordered="true"
> clone shared-disk-clone shared-disk-group \
> meta target-role="stopped"
> location qcddom01-on-ping-net1 qcddom01 \
> rule $id="qcddom01-on-ping-net1-rule" -inf: not_defined ping-net1 or
> ping-net1 lte 0
> location qcddom01-prefer-qcpvms08 qcddom01 500: qcpvms08
> location qcdtypo01-on-ping-net1 qcdtypo01 \
> rule $id="qcdtypo01-on-ping-net1-rule" -inf: not_defined ping-net1 or
> ping-net1 lte 0
> location qcdtypo01-prefer-qcpvms07 qcdtypo01 500: qcpvms07
> colocation colocation-qcddom01-shared-disk-clone inf: qcddom01
> shared-disk-clone
> colocation colocation-qcdtypo01-shared-disk-clone inf: qcdtypo01
> shared-disk-clone
> order order-qcddom01 inf: shared-disk-clone qcddom01
> order order-qcdtypo01 inf: shared-disk-clone qcdtypo01
> property $id="cib-bootstrap-options" \
> dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \
> cluster-infrastructure="openais" \
> no-quorum-policy="freeze" \
> default-resource-stickiness="500" \
> last-lrm-refresh="1281552641" \
> expected-quorum-votes="3" \
> stonith-timeout="240s"
> op_defaults $id="op_defaults-options" \
> record-pending="false"
>
> Claude
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>




More information about the Pacemaker mailing list