hi nikita<br>thanks for all your help and i apologize for the simple mistakes, this is my first pacemaker cluster. I do appreciate all you assistance. Currently the pingd starts but does not failover the resources the <a href="http://ha.cf">ha.cf</a>, crm_mon and crm configure show are below<br>

<br>Here is my <a href="http://ha.cf">ha.cf</a> <br>autojoin none<br>debug 1<br>debugfile /var/log/ha-debug<br>logfile /var/log/ha-log<br>logfacility local0<br>#use_logd on<br>mcast eth1 239.0.0.1 694 1 0<br>bcast eth1<br>

warntime 5<br>deadtime 20<br>initdead 60<br>keepalive 2<br>node crhnode1<br>node crhnode2<br>#deadping 15<br>#ping 10.100.0.254<br>crm yes<br><br>Current crm_mon<br>============<br>Last updated: Fri Jan 28 14:10:22 2011<br>

Stack: Heartbeat<br>Current DC: crhnode2 (59440607-2a5c-450e-84fa-94bf69742671) - partition with quo<br>rum<br>Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3<br>2 Nodes configured, unknown expected votes<br>2 Resources configured.<br>

============<br><br>Online: [ crhnode1 crhnode2 ]<br><br> Clone Set: MYPINGCLONE<br>     Started: [ crhnode1 crhnode2 ]<br> Resource Group: WEBRES<br>     failoverip (ocf::heartbeat:IPaddr):        Started crhnode2<br>     crhweb     (ocf::heartbeat:apache):        Started crhnode2<br>

<br>crm configure show<br>node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \<br>        attributes standby="off"<br>node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \<br>        attributes standby="off"<br>

primitive MYPING ocf:pacemaker:pingd \<br>        params host_list="10.100.0.254" multiplier="100" \<br>        op monitor interval="15s" timeout="20s" \<br>        op start interval="5" timeout="90s" \<br>

        op stop interval="0" timeout="100s"<br>primitive crhweb ocf:heartbeat:apache \<br>        params configfile="/etc/httpd/conf/httpd.conf" \<br>        op monitor interval="30s" \<br>

        op start interval="0" timeout="40s" \<br>        op stop interval="0" timeout="60s"<br>primitive failoverip ocf:heartbeat:IPaddr \<br>        params ip="10.100.1.100" cidr_netmask="255.255.0.0" \<br>

        op monitor interval="30s"<br>group WEBRES failoverip crhweb \<br>        meta target-role="Started"<br>clone MYPINGCLONE MYPING \<br>        meta globally-unique="false" target-role="Started"<br>

location web_location WEBRES \<br>        rule $id="web_location-rule" -inf: not_defined pingd or pingd lte 0<br>order crhweb-after-failoverip inf: MYPINGCLONE WEBRES<br>property $id="cib-bootstrap-options" \<br>

        dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \<br>        cluster-infrastructure="Heartbeat" \<br>        stonith-enabled="false" \<br>        no-quorum-policy="ignore"<br>

rsc_defaults $id="rsc-options" \<br>        resource-stickiness="100"<br><br><br><br><br><div class="gmail_quote">On 1 February 2011 12:04, Nikita Michalko <span dir="ltr"><<a href="mailto:michalko.system@a-i-p.com">michalko.system@a-i-p.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi Paul!<br>

<br>

Can you show me your <a href="http://ha.cf" target="_blank">ha.cf</a>?<br>

How many network  interfaces do you use for this cluster?<br>

If only one, it is the typical split-brain situation after cable pull down!<br>

<br>

Nikita<br>

<br>

<br>

Am Dienstag, 1. Februar 2011 12:05 schrieb paul harford:<br>

<div><div></div><div class="h5">> Hi NIkita<br>

> I reverted to an early snapshot and started again i now have ping d running<br>

> but when i remove the eth0 the resource does not failover<br>

><br>

> i can see in the ha-log that the ping detects the network is gone but it<br>

> does not move the resource. Can anyone see the error in my config?<br>

><br>

><br>

> node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" node1 \<br>

>         attributes standby="off"<br>

> node $id="59440607-2a5c-450e-84fa-94bf69742671" node2 \<br>

>         attributes standby="off"<br>

> primitive MYPING ocf:pacemaker:pingd \<br>

>         params host_list="10.100.0.254" multiplier="1000" \<br>

>         op monitor interval="15s" timeout="20s" \<br>

>         op start interval="0" timeout="90s" \<br>

>         op stop interval="0" timeout="100s"<br>

> primitive crhweb ocf:heartbeat:apache \<br>

>         params configfile="/etc/httpd/conf/httpd.conf" \<br>

>         op monitor interval="60s" \<br>

>         meta target-role="Started"<br>

> primitive failoverip ocf:heartbeat:IPaddr \<br>

>         params ip="10.100.1.100" cidr_netmask="255.255.0.0" \<br>

>         op monitor interval="30s"<br>

> clone MYPINGCLONE MYPING \<br>

>         meta globally-unique="false"<br>

> location web_location crhweb \<br>

>         rule $id="web_location-rule" -inf: not_defined pingd or pingd lte 0<br>

> colocation crhweb-with-failoverip inf: crhweb failoverip<br>

> order crhweb-after-failoverip inf: MYPINGCLONE failoverip crhweb<br>

> property $id="cib-bootstrap-options" \<br>

>         dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \<br>

>         cluster-infrastructure="Heartbeat" \<br>

>         stonith-enabled="false" \<br>

>         no-quorum-policy="ignore"<br>

> rsc_defaults $id="rsc-options" \<br>

>         resource-stickiness="100"<br>

><br>

><br>

> HA_LOG<br>

><br>

> Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: glib: Error sending packet:<br>

> Network is unreachable<br>

> Jan 28 11:17:42 node1 heartbeat: [2872]: info: glib: euid=0 egid=0<br>

> Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: write_child: write failure<br>

> on ping 10.100.0.254.: Network is unreachable<br>

> Jan 28 11:17:43 node1 pingd: [6004]: WARN: ping_write: Wrote -1 of 39<br>

> chars: Network is unreachable (101<br>

><br>

> On 1 February 2011 09:35, paul harford <<a href="mailto:harfordmeister@gmail.com">harfordmeister@gmail.com</a>> wrote:<br>

> > Hi NIkita<br>

> > Many thanks for your assistance, i updated the changes you noticed but<br>

> > now my 2 nodes just keep rebooting, did i enter something incorrectly in<br>

> > the pingd directive ?<br>

> ><br>

> > Paul<br>

> ><br>

> ><br>

> > i can see these errors in the messages log and my configuration is below<br>

> ><br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: clone_print:  Clone<br>

> > Set: connected<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: short_print:<br>

> > Stopped: [ pingd:0 pingd:1 ]<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: rsc_merge_weights:<br>

> > failoverip: Rolling back scores from crhweb<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: native_color: Resource<br>

> > crhweb cannot run anywhere<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp:  Start<br>

> > recurring monitor (10s) for pingd:0 on crhnode2<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation<br>

> > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use<br>

> > the same (name, interval) combination more than once per resource<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation<br>

> > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use<br>

> > the same (name, interval) combination more than once per resource<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp:  Start<br>

> > recurring monitor (10s) for pingd:1 on crhnode1<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation<br>

> > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use<br>

> > the same (name, interval) combination more than once per resource<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation<br>

> > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use<br>

> > the same (name, interval) combination more than once per resource<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Leave<br>

> > resource failoverip (Started crhnode1)<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Stop<br>

> > resource crhweb      (crhnode1)<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start<br>

> > pingd:0     (crhnode2)<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start<br>

> > pingd:1     (crhnode1)<br>

> > Feb  1 09:01:06 crhnode2 crmd: [3742]: info: do_state_transition: State<br>

> > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS<br>

> > cause=C_IPC_MESSAGE origin=handle_response ]<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message:<br>

> > Transition 59: PEngine Input stored in: /var/lib/pengine/pe-input-82.bz2<br>

> > Feb  1 09:01:06 crhnode2 crmd: [3742]: info: unpack_graph: Unpacked<br>

> > transition 59: 14 actions in 14 synapses<br>

> > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message:<br>

> > Configuration ERRORs found during PE processing.  Please run "crm_verify<br>

> > -L" to identify issues.<br>

> ><br>

> ><br>

> ><br>

> > here is my current configuration<br>

> ><br>

> ><br>

> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \<br>

> >         attributes standby="off"<br>

> > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \<br>

> >         attributes standby="off"<br>

> > primitive crhweb ocf:heartbeat:apache \<br>

> ><br>

> >         params configfile="/etc/httpd/conf/httpd.conf" \<br>

> >         op monitor interval="60s" \<br>

> >         meta target-role="Started"<br>

> > primitive failoverip ocf:heartbeat:IPaddr \<br>

> >         params ip="10.100.1.100" cidr_netmask="255.255.0.0" \<br>

> >         op monitor interval="30s" \<br>

> >         meta target-role="Started"<br>

> > primitive pingd ocf:pacemaker:pingd \<br>

> >         params dampen="5s" host_list="10.100.0.254" multiplier="1000"<br>

> > name="pingval" \<br>

> >         operations $id="pingd-operations" \<br>

> >         op monitor interval="10s" timeout="20s" \<br>

> >         op monitor interval="90s" timeout="25s" start \<br>

> >         op monitor interval="100s" timeout="25s" stop<br>

> > clone connected pingd \<br>

> ><br>

> >         meta globally-unique="false" target-role="started"<br>

> > location cli-prefer-crhweb crhweb \<br>

> ><br>

> >         rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1<br>

> > location crhweb_on_connected_node crhweb \<br>

> >         rule $id="crhweb_on_connected_node-rule" -inf: not_defined<br>

> > pingval or pingval lte 0<br>

> ><br>

> > location prefer-crhnode1 crhweb 50: crhnode1<br>

> > colocation crhweb-with-failoverip inf: crhweb failoverip<br>

> > order crhweb-after-failoverip inf: pingd failoverip crhweb<br>

> ><br>

> > property $id="cib-bootstrap-options" \<br>

> >         dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \<br>

> >         cluster-infrastructure="Heartbeat" \<br>

> >         stonith-enabled="false" \<br>

> >         no-quorum-policy="ignore"<br>

> ><br>

> > On 1 February 2011 07:21, Nikita Michalko<br>

<<a href="mailto:michalko.system@a-i-p.com">michalko.system@a-i-p.com</a>>wrote:<br>

> >> Hi Paul,<br>

> >><br>

> >> see below!<br>

> >><br>

> >> Am Montag, 31. Januar 2011 19:55 schrieb paul harford:<br>

> >> > HI guys<br>

> >> > i'm having some issues with a ping directive, my current config is<br>

> >> > below and basically i want the web resource to failover to the second<br>

> >> > node if<br>

> >><br>

> >> the<br>

> >><br>

> >> > ping can no longer contact the default gateway<br>

> >> ><br>

> >> > so here goes<br>

> >> ><br>

> >> > crm configure primitive ping ocf:pacemaker:ping params dampen=5s<br>

> >> > host_list=(default GateWay) multplier=1000 name=pingval operations<br>

> >> > $id=ping-operations op moinitor interval=10s timeout=15s<br>

> >><br>

> >>  - this is surely wrong: "moinitor" ?<br>

> >>  - no such primitive (ping) below ...<br>

> >><br>

> >> HTH<br>

> >><br>

> >> Nikita Michalko<br>

> >><br>

> >> > and<br>

> >> ><br>

> >> > crm configure clone connected ping meta globally-unique=false<br>

> >> > target-role=started<br>

> >> ><br>

> >> > and<br>

> >> ><br>

> >> > location web_on_connected_node cweb rule<br>

> >> > $id=web_on_connected_node-rule -inf: not_defined pingval or pingval<br>

> >> > lte 0<br>

> >> ><br>

> >> ><br>

> >> > Does anyone see any isssues's whith the above confiuguration ? i want<br>

> >> > to check first as the last time i tried it wouldn't work and my<br>

> >> > resources would not failover or start<br>

> >> ><br>

> >> ><br>

> >> ><br>

> >> ><br>

> >> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \<br>

> >> >         attributes standby="off"<br>

> >> > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \<br>

> >> >         attributes standby="off"<br>

> >> > primitive cweb ocf:heartbeat:apache \<br>

> >> >         params configfile="/etc/httpd/conf/httpd.conf" \<br>

> >> >         op monitor interval="60s" \<br>

> >> >         meta target-role="Started"<br>

> >> > primitive failoverip ocf:heartbeat:IPaddr \<br>

> >> >         params ip="10.100.1.100" cidr_netmask="255.255.0.0" \<br>

> >> >         op monitor interval="30s" \<br>

> >> >         meta target-role="Started"<br>

> >> > location cli-prefer-cweb cweb \<br>

> >> >         rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1<br>

> >> > location prefer-crhnode1 crhweb 50: crhnode1<br>

> >> > colocation cweb-with-failoverip inf: cweb failoverip<br>

> >> > order crhweb-after-failoverip inf: failoverip cweb<br>

> >> > property $id="cib-bootstrap-options" \<br>

> >> >         dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \<br>

> >> >         cluster-infrastructure="Heartbeat" \<br>

> >> >         stonith-enabled="false" \<br>

> >> >         no-quorum-policy="ignore"<br>

> >> > rsc_defaults $id="rsc-options" \<br>

> >> >         resource-stickiness="100"<br>

> >><br>

> >> _______________________________________________<br>

> >> Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

> >> <a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

> >><br>

> >> Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

> >> Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

> >> Bugs:<br>

> >> <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake</a><br>

> >>r<br>

<br>

<br>

_______________________________________________<br>

Pacemaker mailing list: <a href="mailto:Pacemaker@oss.clusterlabs.org">Pacemaker@oss.clusterlabs.org</a><br>

<a href="http://oss.clusterlabs.org/mailman/listinfo/pacemaker" target="_blank">http://oss.clusterlabs.org/mailman/listinfo/pacemaker</a><br>

<br>

Project Home: <a href="http://www.clusterlabs.org" target="_blank">http://www.clusterlabs.org</a><br>

Getting started: <a href="http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf" target="_blank">http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf</a><br>

Bugs: <a href="http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker" target="_blank">http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker</a><br>

</div></div></blockquote></div><br>