Hi,<br><br><div class="gmail_quote">On Tue, Feb 1, 2011 at 6:55 PM, paul harford <span dir="ltr"><<a href="mailto:harfordmeister@gmail.com">harfordmeister@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div>Hi Again :-)</div>
<div> </div>
<div>I think my main problem is my location configuration when i bring down eth0 on node1 the and looking at crm_m -f the count on node 2 never increases</div>
<div> </div>
<div>Could anyone help me out with the pingd / location restraints required for a group of resources to failover from node1 to node 2 if the node1 can no longer ping the default gateway ?</div></blockquote><div><br>Don't use pingd, use ocf:pacemaker:ping.<br>
Here's a working config:<br>primitive ping_the_gw ocf:pacemaker:ping \<br> params host_list="1.2.3.4" multiplier="100" name="ping_the_gw" \<br> op monitor interval="5s" timeout="60s" \<br>
op start interval="0s" timeout="60s" \<br> op stop interval="0s"<br>clone ping_the_gw_clone ping_the_gw \<br> meta globally-unique="false"<br>location nok_ping_the_gw grouped_resources \<br>
rule $id="nok_ping_the_gw-rule" -inf: not_defined ping_the_gw or ping_the_gw lte 0<br>group grouped_resources virtual_ip fs_mysql httpd mysqld<br><br>The "grouped_resources" group will not be allowed to run on a node if the ping_the_gw resource is not defined on that node or that node cannot ping the gateway. <br>
<br>In your config you should change <br>location web_location crhweb \<br> rule $id="web_location-rule" -inf: not_defined pingd or pingd lte 0<br>to<br>location web_location crhweb \<br> rule $id="web_location-rule" -inf: not_defined MYPING or MYPING lte 0<br>
and<br>primitive MYPING ocf:pacemaker:ping \<br> params host_list="10.100.0.254" multiplier="1000" \<br> op monitor interval="15s" timeout="20s" \<br> op start interval="0" timeout="90s" \<br>
op stop interval="0" timeout="100s"<br>to<br>primitive MYPING ocf:pacemaker:ping \<br> params host_list="10.100.0.254" multiplier="1000" \<br> op monitor interval="15s" timeout="20s" \<br>
op start interval="0" timeout="90s" \<br> op stop interval="0" timeout="100s"<br><br>Regards,<br>Dan<br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div> </div>
<div>Thanks</div>
<div>again</div><div><div></div><div class="h5">
<div> </div>
<div><br><br> </div>
<div class="gmail_quote">On 1 February 2011 13:08, paul harford <span dir="ltr"><<a href="mailto:harfordmeister@gmail.com" target="_blank">harfordmeister@gmail.com</a>></span> wrote:<br>
<blockquote style="border-left: 1px solid rgb(204, 204, 204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;" class="gmail_quote">Hi Nikita<br>Sorry i fogot i have 2 ethernet interfaces eth 1 is for the heartbeat and eth 0 is for the public ip and the virtual ip for apache is 10.100.1.100<br>
<br>Thanks <br><font color="#888888">Paul</font>
<div>
<div></div>
<div><br><br>
<div class="gmail_quote">On 1 February 2011 12:04, Nikita Michalko <span dir="ltr"><<a href="mailto:michalko.system@a-i-p.com" target="_blank">michalko.system@a-i-p.com</a>></span> wrote:<br>
<blockquote style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;" class="gmail_quote">Hi Paul!<br><br>Can you show me your <a href="http://ha.cf/" target="_blank">ha.cf</a>?<br>
How many network interfaces do you use for this cluster?<br>
If only one, it is the typical split-brain situation after cable pull down!<br><br>Nikita<br><br><br>Am Dienstag, 1. Februar 2011 12:05 schrieb paul harford:<br>
<div>
<div></div>
<div>> Hi NIkita<br>> I reverted to an early snapshot and started again i now have ping d running<br>> but when i remove the eth0 the resource does not failover<br>><br>> i can see in the ha-log that the ping detects the network is gone but it<br>
> does not move the resource. Can anyone see the error in my config?<br>><br>><br>> node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" node1 \<br>> attributes standby="off"<br>> node $id="59440607-2a5c-450e-84fa-94bf69742671" node2 \<br>
> attributes standby="off"<br>> primitive MYPING ocf:pacemaker:pingd \<br>> params host_list="10.100.0.254" multiplier="1000" \<br>> op monitor interval="15s" timeout="20s" \<br>
> op start interval="0" timeout="90s" \<br>> op stop interval="0" timeout="100s"<br>> primitive crhweb ocf:heartbeat:apache \<br>> params configfile="/etc/httpd/conf/httpd.conf" \<br>
> op monitor interval="60s" \<br>> meta target-role="Started"<br>> primitive failoverip ocf:heartbeat:IPaddr \<br>> params ip="10.100.1.100" cidr_netmask="255.255.0.0" \<br>
> op monitor interval="30s"<br>> clone MYPINGCLONE MYPING \<br>> meta globally-unique="false"<br>> location web_location crhweb \<br>> rule $id="web_location-rule" -inf: not_defined pingd or pingd lte 0<br>
> colocation crhweb-with-failoverip inf: crhweb failoverip<br>> order crhweb-after-failoverip inf: MYPINGCLONE failoverip crhweb<br>> property $id="cib-bootstrap-options" \<br>> dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \<br>
> cluster-infrastructure="Heartbeat" \<br>> stonith-enabled="false" \<br>> no-quorum-policy="ignore"<br>> rsc_defaults $id="rsc-options" \<br>> resource-stickiness="100"<br>
><br>><br>> HA_LOG<br>><br>> Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: glib: Error sending packet:<br>> Network is unreachable<br>> Jan 28 11:17:42 node1 heartbeat: [2872]: info: glib: euid=0 egid=0<br>
> Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: write_child: write failure<br>> on ping 10.100.0.254.: Network is unreachable<br>> Jan 28 11:17:43 node1 pingd: [6004]: WARN: ping_write: Wrote -1 of 39<br>> chars: Network is unreachable (101<br>
><br>> On 1 February 2011 09:35, paul harford <<a href="mailto:harfordmeister@gmail.com" target="_blank">harfordmeister@gmail.com</a>> wrote:<br>> > Hi NIkita<br>> > Many thanks for your assistance, i updated the changes you noticed but<br>
> > now my 2 nodes just keep rebooting, did i enter something incorrectly in<br>> > the pingd directive ?<br>> ><br>> > Paul<br>> ><br>> ><br>> > i can see these errors in the messages log and my configuration is below<br>
> ><br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: clone_print: Clone<br>> > Set: connected<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: short_print:<br>> > Stopped: [ pingd:0 pingd:1 ]<br>
> > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: rsc_merge_weights:<br>> > failoverip: Rolling back scores from crhweb<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: native_color: Resource<br>
> > crhweb cannot run anywhere<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp: Start<br>> > recurring monitor (10s) for pingd:0 on crhnode2<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation<br>
> > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use<br>> > the same (name, interval) combination more than once per resource<br>
> > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation<br>> > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use<br>
> > the same (name, interval) combination more than once per resource<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp: Start<br>> > recurring monitor (10s) for pingd:1 on crhnode1<br>
> > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation<br>> > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use<br>
> > the same (name, interval) combination more than once per resource<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation<br>> > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s<br>
> > Feb 1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use<br>> > the same (name, interval) combination more than once per resource<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Leave<br>
> > resource failoverip (Started crhnode1)<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Stop<br>> > resource crhweb (crhnode1)<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start<br>
> > pingd:0 (crhnode2)<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start<br>> > pingd:1 (crhnode1)<br>> > Feb 1 09:01:06 crhnode2 crmd: [3742]: info: do_state_transition: State<br>
> > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS<br>> > cause=C_IPC_MESSAGE origin=handle_response ]<br>> > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message:<br>
> > Transition 59: PEngine Input stored in: /var/lib/pengine/pe-input-82.bz2<br>> > Feb 1 09:01:06 crhnode2 crmd: [3742]: info: unpack_graph: Unpacked<br>> > transition 59: 14 actions in 14 synapses<br>
> > Feb 1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message:<br>> > Configuration ERRORs found during PE processing. Please run "crm_verify<br>> > -L" to identify issues.<br>> ><br>
> ><br>> ><br>> > here is my current configuration<br>> ><br>> ><br>> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \<br>> > attributes standby="off"<br>
> > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \<br>> > attributes standby="off"<br>> > primitive crhweb ocf:heartbeat:apache \<br>> ><br>> > params configfile="/etc/httpd/conf/httpd.conf" \<br>
> > op monitor interval="60s" \<br>> > meta target-role="Started"<br>> > primitive failoverip ocf:heartbeat:IPaddr \<br>> > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \<br>
> > op monitor interval="30s" \<br>> > meta target-role="Started"<br>> > primitive pingd ocf:pacemaker:pingd \<br>> > params dampen="5s" host_list="10.100.0.254" multiplier="1000"<br>
> > name="pingval" \<br>> > operations $id="pingd-operations" \<br>> > op monitor interval="10s" timeout="20s" \<br>> > op monitor interval="90s" timeout="25s" start \<br>
> > op monitor interval="100s" timeout="25s" stop<br>> > clone connected pingd \<br>> ><br>> > meta globally-unique="false" target-role="started"<br>
> > location cli-prefer-crhweb crhweb \<br>> ><br>> > rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1<br>> > location crhweb_on_connected_node crhweb \<br>> > rule $id="crhweb_on_connected_node-rule" -inf: not_defined<br>
> > pingval or pingval lte 0<br>> ><br>> > location prefer-crhnode1 crhweb 50: crhnode1<br>> > colocation crhweb-with-failoverip inf: crhweb failoverip<br>> > order crhweb-after-failoverip inf: pingd failoverip crhweb<br>
> ><br>> > property $id="cib-bootstrap-options" \<br>> > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \<br>> > cluster-infrastructure="Heartbeat" \<br>
> > stonith-enabled="false" \<br>> > no-quorum-policy="ignore"<br>> ><br>> > On 1 February 2011 07:21, Nikita Michalko<br><<a href="mailto:michalko.system@a-i-p.com" target="_blank">michalko.system@a-i-p.com</a>>wrote:<br>
> >> Hi Paul,<br>> >><br>> >> see below!<br>> >><br>> >> Am Montag, 31. Januar 2011 19:55 schrieb paul harford:<br>> >> > HI guys<br>> >> > i'm having some issues with a ping directive, my current config is<br>
> >> > below and basically i want the web resource to failover to the second<br>> >> > node if<br>> >><br>> >> the<br>> >><br>> >> > ping can no longer contact the default gateway<br>
> >> ><br>> >> > so here goes<br>> >> ><br>> >> > crm configure primitive ping ocf:pacemaker:ping params dampen=5s<br>> >> > host_list=(default GateWay) multplier=1000 name=pingval operations<br>
> >> > $id=ping-operations op moinitor interval=10s timeout=15s<br>> >><br>> >> - this is surely wrong: "moinitor" ?<br>> >> - no such primitive (ping) below ...<br>> >><br>
> >> HTH<br>> >><br>> >> Nikita Michalko<br>> >><br>> >> > and<br>> >> ><br>> >> > crm configure clone connected ping meta globally-unique=false<br>
> >> > target-role=started<br>
> >> ><br>> >> > and<br>> >> ><br>> >> > location web_on_connected_node cweb rule<br>> >> > $id=web_on_connected_node-rule -inf: not_defined pingval or pingval<br>
> >> > lte 0<br>> >> ><br>> >> ><br>> >> > Does anyone see any isssues's whith the above confiuguration ? i want<br>> >> > to check first as the last time i tried it wouldn't work and my<br>
> >> > resources would not failover or start<br>> >> ><br>> >> ><br>> >> ><br>> >> ><br>> >> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \<br>
> >> > attributes standby="off"<br>> >> > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \<br>> >> > attributes standby="off"<br>
> >> > primitive cweb ocf:heartbeat:apache \<br>> >> > params configfile="/etc/httpd/conf/httpd.conf" \<br>> >> > op monitor interval="60s" \<br>> >> > meta target-role="Started"<br>
> >> > primitive failoverip ocf:heartbeat:IPaddr \<br>> >> > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \<br>> >> > op monitor interval="30s" \<br>
> >> > meta target-role="Started"<br>> >> > location cli-prefer-cweb cweb \<br>> >> > rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1<br>> >> > location prefer-crhnode1 crhweb 50: crhnode1<br>
> >> > colocation cweb-with-failoverip inf: cweb failoverip<br>> >> > order crhweb-after-failoverip inf: failoverip cweb<br>> >> > property $id="cib-bootstrap-options" \<br>> >> > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \<br>
> >> > cluster-infrastructure="Heartbeat" \<br>> >> > stonith-enabled="false" \<br>> >> > no-quorum-policy="ignore"<br>> >> > rsc_defaults $id="rsc-options" \<br>
> >> > resource-stickiness="100"<br>> >><br>> >> _______________________________________________<br>> >> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br>
> >> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>> >><br>> >> Project Home: <a href="http://www.clusterlabs.org/" target="_blank">http://www.clusterlabs.org</a><br>
> >> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>> >> Bugs:<br>> >> <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake</a><br>
> >>r<br><br><br>_______________________________________________<br>Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org" target="_blank">Pacemaker@oss.clusterlabs.org</a><br><a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>Project Home: <a href="http://www.clusterlabs.org/" target="_blank">http://www.clusterlabs.org</a><br>Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br></div></div></blockquote></div><br>
</div></div></blockquote></div><br>
</div></div><br>_______________________________________________<br>
Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>
<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>
<br>
Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>
Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>
Bugs: <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br>Dan Frincu<div>CCNA, RHCE</div><br>