<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Times New Roman; font-size: 12pt; color: #000000'><font size="3">Hi Emmanuel,</font><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; "><br></div><div style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; ">Here is the output from crm_mon -of1 :</div><div><div>Operations:</div><div>* Node quorumnode: </div><div> p_drbd_mount2:0: migration-threshold=1000000</div><div> + (4) probe: rc=5 (not installed)</div><div> p_drbd_mount1:0: migration-threshold=1000000</div><div> + (5) probe: rc=5 (not installed)</div><div> p_drbd_vmstore:0: migration-threshold=1000000</div><div> + (6) probe: rc=5 (not installed)</div><div> p_vm_myvm: migration-threshold=1000000</div><div> + (12) probe: rc=5 (not installed)</div><div>* Node vmhost1: </div><div> p_drbd_mount2:0: migration-threshold=1000000</div><div> + (34) promote: rc=0 (ok)</div><div> + (62) monitor: interval=10000ms rc=8 (master)</div><div> p_drbd_vmstore:0: migration-threshold=1000000</div><div> + (26) promote: rc=0 (ok)</div><div> + (64) monitor: interval=10000ms rc=8 (master)</div><div> p_fs_vmstore: migration-threshold=1000000</div><div> + (36) start: rc=0 (ok)</div><div> + (38) monitor: interval=20000ms rc=0 (ok)</div><div> p_ping:0: migration-threshold=1000000</div><div> + (12) start: rc=0 (ok)</div><div> + (22) monitor: interval=20000ms rc=0 (ok)</div><div> p_vm_myvm: migration-threshold=1000000</div><div> + (65) start: rc=0 (ok)</div><div> + (66) monitor: interval=10000ms rc=0 (ok)</div><div> stonithvmhost2: migration-threshold=1000000</div><div> + (17) start: rc=0 (ok)</div><div> p_drbd_mount1:0: migration-threshold=1000000</div><div> + (31) promote: rc=0 (ok)</div><div> + (63) monitor: interval=10000ms rc=8 (master)</div><div> p_sysadmin_notify:0: migration-threshold=1000000</div><div> + (13) start: rc=0 (ok)</div><div> + (18) monitor: interval=10000ms rc=0 (ok)</div><div>* Node vmhost2: </div><div> p_drbd_mount2:1: migration-threshold=1000000</div><div> + (14) start: rc=0 (ok)</div><div> + (36) monitor: interval=20000ms rc=0 (ok)</div><div> p_drbd_vmstore:1: migration-threshold=1000000</div><div> + (16) start: rc=0 (ok)</div><div> + (38) monitor: interval=20000ms rc=0 (ok)</div><div> p_ping:1: migration-threshold=1000000</div><div> + (12) start: rc=0 (ok)</div><div> + (20) monitor: interval=20000ms rc=0 (ok)</div><div> stonithquorumnode: migration-threshold=1000000</div><div> + (18) start: rc=0 (ok)</div><div> stonithvmhost1: migration-threshold=1000000</div><div> + (17) start: rc=0 (ok)</div><div> p_sysadmin_notify:1: migration-threshold=1000000</div><div> + (13) start: rc=0 (ok)</div><div> + (19) monitor: interval=10000ms rc=0 (ok)</div><div> p_drbd_mount1:1: migration-threshold=1000000</div><div> + (15) start: rc=0 (ok)</div><div> + (37) monitor: interval=20000ms rc=0 (ok)</div><div><br></div><div>Failed actions:</div><div> p_drbd_mount2:0_monitor_0 (node=quorumnode, call=4, rc=5, status=complete): not installed</div><div> p_drbd_mount1:0_monitor_0 (node=quorumnode, call=5, rc=5, status=complete): not installed</div><div> p_drbd_vmstore:0_monitor_0 (node=quorumnode, call=6, rc=5, status=complete): not installed</div><div> p_vm_myvm_monitor_0 (node=quorumnode, call=12, rc=5, status=complete): not installed</div><div><br></div><div>What is the number in parenthesis before "start" or "monitor"? Is it the number of times this operation has occurred? Does this give any additional clues to what happened? What should I look for specifically in this output?</div><div><br></div><div>Thanks,</div><div><br></div><div>Andrew</div><br><hr id="zwchr" style="color: rgb(0, 0, 0); font-family: 'Times New Roman'; font-size: 12pt; "><div style="color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica, Arial, sans-serif; font-size: 12pt; "><b>From: </b>"emmanuel segura" <emi2fast@gmail.com><br><b>To: </b>"The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org><br><b>Sent: </b>Tuesday, June 19, 2012 12:12:34 PM<br><b>Subject: </b>Re: [Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource?<br><br>Hello Andrew<br><br>use crm_mon -of when your virtualdomain resource fail to see which operation resource report the problem<br><br><div class="gmail_quote">2012/6/19 Andrew Martin <span dir="ltr"><<a href="mailto:amartin@xes-inc.com" target="_blank">amartin@xes-inc.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-size:12pt;font-family:Times New Roman">Hi Emmanuel,<div><br></div><div>Thanks for the idea. I looked through the rest of the log and these "return code 8" errors on the ocf:linbit:drbd resources are occurring at other intervals (e.g. today) when the VirtualDomain resource is unaffected. This seems to indicate that these soft errors do not trigger a restart of the VirtualDomain resource. Is there anything else in the log that could indicate what caused this, or is there somewhere else I can look?</div>
<div><br></div><div>Thanks,</div><div><br></div><div>Andrew</div><div><br><hr><div style="font-size:12pt;font-style:normal;font-family:Helvetica,Arial,sans-serif;text-decoration:none;font-weight:normal"><b>From: </b><span>"emmanuel segura" <<a href="mailto:emi2fast@gmail.com" title="[GMCP] Compose a new mail to emi2fast@gmail.com" rel="noreferrer" target="_blank">emi2fast@gmail.com</a>></span><br>
<b>To: </b><span>"The Pacemaker cluster resource manager" <<a href="mailto:pacemaker@oss.clusterlabs.org" title="[GMCP] Compose a new mail to pacemaker@oss.clusterlabs.org" rel="noreferrer" target="_blank">pacemaker@oss.clusterlabs.org</a>></span><br>
<b>Sent: </b>Tuesday, June 19, 2012 9:57:19 AM<br><b>Subject: </b>Re: [Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource?<div class="im"><br><br>I didn't see any error in your config, the only thing i seen it's this<br>
==========================================================<br>Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_vmstore:0<br>monitor[55] (pid 12323)<br>Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount2:0 monitor[53]<br>
(pid 12324)<br>Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[55] on<br>p_drbd_vmstore:0 for client 3856: pid 12323 exited with return code 8<br>Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[53] on<br>
p_drbd_mount2:0 for client 3856: pid 12324 exited with return code 8<br>Jun 14 15:35:31 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount1:0 monitor[54]<br>(pid 12396)<br>=========================================================<br>
it can be a drbd problem, but i tell you the true i'm not sure<br><br>======================================================<br><span><a href="http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html" target="_blank">http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html</a></span><br>
=========================================================<br><br><span>2012/6/19 Andrew Martin <<a href="mailto:amartin@xes-inc.com" title="[GMCP] Compose a new mail to amartin@xes-inc.com" rel="noreferrer" target="_blank">amartin@xes-inc.com</a>></span><br>
<br>> Hello,<br>><br>> I have a 3 node Pacemaker+Heartbeat cluster (two real nodes and one<br>> "standby" quorum node) with Ubuntu 10.04 LTS on the nodes and using the<br>> Pacemaker+Heartbeat packages from the Ubuntu HA Team PPA (<br>
</div><span>> <a href="https://launchpad.net/%7Eubuntu-ha-maintainers/+archive/ppa" target="_blank">https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa</a><<a href="https://launchpad.net/%7Eubuntu-ha-maintainers/+archive/ppa" target="_blank">https://launchpad.net/%7Eubuntu-ha-maintainers/+archive/ppa</a>>).</span><div>
<div class="h5"><br>> I have configured 3 DRBD resources, a filesystem mount, and a KVM-based<br>> virtual machine (using the VirtualDomain resource). I have constraints in<br>> place so that the DRBD devices must become primary and the filesystem must<br>
> be mounted before the VM can start:<br>> node $id="1ab0690c-5aa0-4d9c-ae4e-b662e0ca54e5" vmhost1<br>> node $id="219e9bf6-ea99-41f4-895f-4c2c5c78484a" quorumnode \<br>> attributes standby="on"<br>
> node $id="645e09b4-aee5-4cec-a241-8bd4e03a78c3" vmhost2<br>> primitive p_drbd_mount2 ocf:linbit:drbd \<br>> params drbd_resource="mount2" \<br>> op start interval="0" timeout="240" \<br>
> op stop interval="0" timeout="100" \<br>> op monitor interval="10" role="Master" timeout="30" \<br>> op monitor interval="20" role="Slave" timeout="30"<br>
> primitive p_drbd_mount1 ocf:linbit:drbd \<br>> params drbd_resource="mount1" \<br>> op start interval="0" timeout="240" \<br>> op stop interval="0" timeout="100" \<br>
> op monitor interval="10" role="Master" timeout="30" \<br>> op monitor interval="20" role="Slave" timeout="30"<br>> primitive p_drbd_vmstore ocf:linbit:drbd \<br>
> params drbd_resource="vmstore" \<br>> op start interval="0" timeout="240" \<br>> op stop interval="0" timeout="100" \<br>> op monitor interval="10" role="Master" timeout="30" \<br>
> op monitor interval="20" role="Slave" timeout="30"<br>> primitive p_fs_vmstore ocf:heartbeat:Filesystem \<br>> params device="/dev/drbd0" directory="/mnt/storage/vmstore"<br>
> fstype="ext4" \<br>> op start interval="0" timeout="60" \<br>> op stop interval="0" timeout="60" \<br>> op monitor interval="20" timeout="40"<br>
> primitive p_ping ocf:pacemaker:ping \<br>> params name="p_ping" host_list="192.168.1.25 192.168.1.26"<br>> multiplier="1000" \<br>> op start interval="0" timeout="60" \<br>
> op monitor interval="20s" timeout="60"<br>> primitive p_sysadmin_notify ocf:heartbeat:MailTo \<br><span>> params email="<a href="mailto:alert@example.com" title="[GMCP] Compose a new mail to alert@example.com" rel="noreferrer" target="_blank">alert@example.com</a>" \</span><br>
> params subject="Pacemaker Change" \<br>> op start interval="0" timeout="10" \<br>> op stop interval="0" timeout="10" \<br>> op monitor interval="10" timeout="10"<br>
> primitive p_vm_myvm ocf:heartbeat:VirtualDomain \<br>> params config="/mnt/storage/vmstore/config/myvm.xml" \<br>> meta allow-migrate="false" target-role="Started" is-managed="true"<br>
> \<br>> op start interval="0" timeout="180" \<br>> op stop interval="0" timeout="180" \<br>> op monitor interval="10" timeout="30"<br>
> primitive stonithquorumnode stonith:external/webpowerswitch \<br>> params wps_ipaddr="192.168.3.100" wps_port="x" wps_username="xxx"<br>> wps_password="xxx" hostname_to_stonith="quorumnode"<br>
> primitive stonithvmhost1 stonith:external/webpowerswitch \<br>> params wps_ipaddr="192.168.3.100" wps_port="x" wps_username="xxx"<br>> wps_password="xxx" hostname_to_stonith="vmhost1"<br>
> primitive stonithvmhost2 stonith:external/webpowerswitch \<br>> params wps_ipaddr="192.168.3.100" wps_port="x" wps_username="xxx"<br>> wps_password="xxx" hostname_to_stonith="vmhost2"<br>
> group g_vm p_fs_vmstore p_vm_myvm<br>> ms ms_drbd_mount2 p_drbd_mount2 \<br>> meta master-max="1" master-node-max="1" clone-max="2"<br>> clone-node-max="1" notify="true"<br>
> ms ms_drbd_mount1 p_drbd_mount1 \<br>> meta master-max="1" master-node-max="1" clone-max="2"<br>> clone-node-max="1" notify="true"<br>> ms ms_drbd_vmstore p_drbd_vmstore \<br>
> meta master-max="1" master-node-max="1" clone-max="2"<br>> clone-node-max="1" notify="true"<br>> clone cl_ping p_ping \<br>> meta interleave="true"<br>
> clone cl_sysadmin_notify p_sysadmin_notify<br>> location loc_run_on_most_connected g_vm \<br>> rule $id="loc_run_on_most_connected-rule" p_ping: defined p_ping<br>> location loc_st_nodescan stonithquorumnode -inf: vmhost1<br>
> location loc_st_vmhost1 stonithvmhost1 -inf: vmhost1<br>> location loc_st_vmhost2 stonithvmhost2 -inf: vmhost2<br>> colocation c_drbd_libvirt_vm inf: g_vm ms_drbd_vmstore:Master<br>> ms_drbd_tools:Master ms_drbd_crm:Master<br>
> order o_drbd-fs-vm inf: ms_drbd_vmstore:promote ms_drbd_tools:promote<br>> ms_drbd_crm:promote g_vm:start<br>> property $id="cib-bootstrap-options" \<br>> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \<br>
> cluster-infrastructure="Heartbeat" \<br>> stonith-enabled="true" \<br>> no-quorum-policy="freeze" \<br>> last-lrm-refresh="1337746179"<br>
><br>> This has been working well, however last week Pacemaker all of a sudden<br>> stopped the p_vm_myvm resource and then started it up again. I have<br>> attached the relevant section of /var/log/daemon.log - I am unable to<br>
> determine what caused Pacemaker to restart this resource. Based on the log,<br>> could you tell me what event triggered this?<br>><br>> Thanks,<br>><br>> Andrew<br>><br>> _______________________________________________<br>
<span>> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org" title="[GMCP] Compose a new mail to Pacemaker@oss.clusterlabs.org" rel="noreferrer" target="_blank">Pacemaker@oss.clusterlabs.org</a></span><br>
<span>> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a></span><br>><br><span>> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a></span><br>
<span>> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a></span><br><span>> Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a></span><br>
><br>><br><br><br>-- <br>esta es mi vida e me la vivo hasta que dios quiera<br><br>_______________________________________________<br><span>Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org" title="[GMCP] Compose a new mail to Pacemaker@oss.clusterlabs.org" rel="noreferrer" target="_blank">Pacemaker@oss.clusterlabs.org</a></span><br>
<span><a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a></span><br><br><span>Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a></span><br>
<span>Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a></span><br><span>Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a></span><br>
</div></div></div><br></div></div></div><br>_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://bugs.clusterlabs.org" target="_blank">http://bugs.clusterlabs.org</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br>esta es mi vida e me la vivo hasta que dios quiera<br>
<br>_______________________________________________<br>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org<br>http://oss.clusterlabs.org/mailman/listinfo/pacemaker<br><br>Project Home: http://www.clusterlabs.org<br>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf<br>Bugs: http://bugs.clusterlabs.org<br></div><br></div></div></body></html>