[Pacemaker] ping directive configuration

Tue Feb 1 08:06:52 EST 2011

hi nikita
thanks for all your help and i apologize for the simple mistakes, this is my
first pacemaker cluster. I do appreciate all you assistance. Currently the
pingd starts but does not failover the resources the ha.cf, crm_mon and crm
configure show are below

Here is my ha.cf
autojoin none
debug 1
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
#use_logd on
mcast eth1 239.0.0.1 694 1 0
bcast eth1
warntime 5
deadtime 20
initdead 60
keepalive 2
node crhnode1
node crhnode2
#deadping 15
#ping 10.100.0.254
crm yes

Current crm_mon
============
Last updated: Fri Jan 28 14:10:22 2011
Stack: Heartbeat
Current DC: crhnode2 (59440607-2a5c-450e-84fa-94bf69742671) - partition with
quo
rum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Online: [ crhnode1 crhnode2 ]

 Clone Set: MYPINGCLONE
     Started: [ crhnode1 crhnode2 ]
 Resource Group: WEBRES
     failoverip (ocf::heartbeat:IPaddr):        Started crhnode2
     crhweb     (ocf::heartbeat:apache):        Started crhnode2

crm configure show
node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \
        attributes standby="off"
node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \
        attributes standby="off"
primitive MYPING ocf:pacemaker:pingd \
        params host_list="10.100.0.254" multiplier="100" \
        op monitor interval="15s" timeout="20s" \
        op start interval="5" timeout="90s" \
        op stop interval="0" timeout="100s"
primitive crhweb ocf:heartbeat:apache \
        params configfile="/etc/httpd/conf/httpd.conf" \
        op monitor interval="30s" \
        op start interval="0" timeout="40s" \
        op stop interval="0" timeout="60s"
primitive failoverip ocf:heartbeat:IPaddr \
        params ip="10.100.1.100" cidr_netmask="255.255.0.0" \
        op monitor interval="30s"
group WEBRES failoverip crhweb \
        meta target-role="Started"
clone MYPINGCLONE MYPING \
        meta globally-unique="false" target-role="Started"
location web_location WEBRES \
        rule $id="web_location-rule" -inf: not_defined pingd or pingd lte 0
order crhweb-after-failoverip inf: MYPINGCLONE WEBRES
property $id="cib-bootstrap-options" \
        dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
        cluster-infrastructure="Heartbeat" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"

On 1 February 2011 12:04, Nikita Michalko <michalko.system at a-i-p.com> wrote:

> Hi Paul!
>
> Can you show me your ha.cf?
> How many network  interfaces do you use for this cluster?
> If only one, it is the typical split-brain situation after cable pull down!
>
> Nikita
>
>
> Am Dienstag, 1. Februar 2011 12:05 schrieb paul harford:
> > Hi NIkita
> > I reverted to an early snapshot and started again i now have ping d
> running
> > but when i remove the eth0 the resource does not failover
> >
> > i can see in the ha-log that the ping detects the network is gone but it
> > does not move the resource. Can anyone see the error in my config?
> >
> >
> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" node1 \
> >         attributes standby="off"
> > node $id="59440607-2a5c-450e-84fa-94bf69742671" node2 \
> >         attributes standby="off"
> > primitive MYPING ocf:pacemaker:pingd \
> >         params host_list="10.100.0.254" multiplier="1000" \
> >         op monitor interval="15s" timeout="20s" \
> >         op start interval="0" timeout="90s" \
> >         op stop interval="0" timeout="100s"
> > primitive crhweb ocf:heartbeat:apache \
> >         params configfile="/etc/httpd/conf/httpd.conf" \
> >         op monitor interval="60s" \
> >         meta target-role="Started"
> > primitive failoverip ocf:heartbeat:IPaddr \
> >         params ip="10.100.1.100" cidr_netmask="255.255.0.0" \
> >         op monitor interval="30s"
> > clone MYPINGCLONE MYPING \
> >         meta globally-unique="false"
> > location web_location crhweb \
> >         rule $id="web_location-rule" -inf: not_defined pingd or pingd lte
> 0
> > colocation crhweb-with-failoverip inf: crhweb failoverip
> > order crhweb-after-failoverip inf: MYPINGCLONE failoverip crhweb
> > property $id="cib-bootstrap-options" \
> >         dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
> >         cluster-infrastructure="Heartbeat" \
> >         stonith-enabled="false" \
> >         no-quorum-policy="ignore"
> > rsc_defaults $id="rsc-options" \
> >         resource-stickiness="100"
> >
> >
> > HA_LOG
> >
> > Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: glib: Error sending
> packet:
> > Network is unreachable
> > Jan 28 11:17:42 node1 heartbeat: [2872]: info: glib: euid=0 egid=0
> > Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: write_child: write
> failure
> > on ping 10.100.0.254.: Network is unreachable
> > Jan 28 11:17:43 node1 pingd: [6004]: WARN: ping_write: Wrote -1 of 39
> > chars: Network is unreachable (101
> >
> > On 1 February 2011 09:35, paul harford <harfordmeister at gmail.com> wrote:
> > > Hi NIkita
> > > Many thanks for your assistance, i updated the changes you noticed but
> > > now my 2 nodes just keep rebooting, did i enter something incorrectly
> in
> > > the pingd directive ?
> > >
> > > Paul
> > >
> > >
> > > i can see these errors in the messages log and my configuration is
> below
> > >
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: clone_print:  Clone
> > > Set: connected
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: short_print:
> > > Stopped: [ pingd:0 pingd:1 ]
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: rsc_merge_weights:
> > > failoverip: Rolling back scores from crhweb
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: native_color: Resource
> > > crhweb cannot run anywhere
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp:  Start
> > > recurring monitor (10s) for pingd:0 on crhnode2
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation
> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use
> > > the same (name, interval) combination more than once per resource
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation
> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use
> > > the same (name, interval) combination more than once per resource
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp:  Start
> > > recurring monitor (10s) for pingd:1 on crhnode1
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation
> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use
> > > the same (name, interval) combination more than once per resource
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation
> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not use
> > > the same (name, interval) combination more than once per resource
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Leave
> > > resource failoverip (Started crhnode1)
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Stop
> > > resource crhweb      (crhnode1)
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start
> > > pingd:0     (crhnode2)
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start
> > > pingd:1     (crhnode1)
> > > Feb  1 09:01:06 crhnode2 crmd: [3742]: info: do_state_transition: State
> > > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> > > cause=C_IPC_MESSAGE origin=handle_response ]
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message:
> > > Transition 59: PEngine Input stored in:
> /var/lib/pengine/pe-input-82.bz2
> > > Feb  1 09:01:06 crhnode2 crmd: [3742]: info: unpack_graph: Unpacked
> > > transition 59: 14 actions in 14 synapses
> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message:
> > > Configuration ERRORs found during PE processing.  Please run
> "crm_verify
> > > -L" to identify issues.
> > >
> > >
> > >
> > > here is my current configuration
> > >
> > >
> > > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \
> > >         attributes standby="off"
> > > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \
> > >         attributes standby="off"
> > > primitive crhweb ocf:heartbeat:apache \
> > >
> > >         params configfile="/etc/httpd/conf/httpd.conf" \
> > >         op monitor interval="60s" \
> > >         meta target-role="Started"
> > > primitive failoverip ocf:heartbeat:IPaddr \
> > >         params ip="10.100.1.100" cidr_netmask="255.255.0.0" \
> > >         op monitor interval="30s" \
> > >         meta target-role="Started"
> > > primitive pingd ocf:pacemaker:pingd \
> > >         params dampen="5s" host_list="10.100.0.254" multiplier="1000"
> > > name="pingval" \
> > >         operations $id="pingd-operations" \
> > >         op monitor interval="10s" timeout="20s" \
> > >         op monitor interval="90s" timeout="25s" start \
> > >         op monitor interval="100s" timeout="25s" stop
> > > clone connected pingd \
> > >
> > >         meta globally-unique="false" target-role="started"
> > > location cli-prefer-crhweb crhweb \
> > >
> > >         rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1
> > > location crhweb_on_connected_node crhweb \
> > >         rule $id="crhweb_on_connected_node-rule" -inf: not_defined
> > > pingval or pingval lte 0
> > >
> > > location prefer-crhnode1 crhweb 50: crhnode1
> > > colocation crhweb-with-failoverip inf: crhweb failoverip
> > > order crhweb-after-failoverip inf: pingd failoverip crhweb
> > >
> > > property $id="cib-bootstrap-options" \
> > >         dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
> > >         cluster-infrastructure="Heartbeat" \
> > >         stonith-enabled="false" \
> > >         no-quorum-policy="ignore"
> > >
> > > On 1 February 2011 07:21, Nikita Michalko
> <michalko.system at a-i-p.com>wrote:
> > >> Hi Paul,
> > >>
> > >> see below!
> > >>
> > >> Am Montag, 31. Januar 2011 19:55 schrieb paul harford:
> > >> > HI guys
> > >> > i'm having some issues with a ping directive, my current config is
> > >> > below and basically i want the web resource to failover to the
> second
> > >> > node if
> > >>
> > >> the
> > >>
> > >> > ping can no longer contact the default gateway
> > >> >
> > >> > so here goes
> > >> >
> > >> > crm configure primitive ping ocf:pacemaker:ping params dampen=5s
> > >> > host_list=(default GateWay) multplier=1000 name=pingval operations
> > >> > $id=ping-operations op moinitor interval=10s timeout=15s
> > >>
> > >>  - this is surely wrong: "moinitor" ?
> > >>  - no such primitive (ping) below ...
> > >>
> > >> HTH
> > >>
> > >> Nikita Michalko
> > >>
> > >> > and
> > >> >
> > >> > crm configure clone connected ping meta globally-unique=false
> > >> > target-role=started
> > >> >
> > >> > and
> > >> >
> > >> > location web_on_connected_node cweb rule
> > >> > $id=web_on_connected_node-rule -inf: not_defined pingval or pingval
> > >> > lte 0
> > >> >
> > >> >
> > >> > Does anyone see any isssues's whith the above confiuguration ? i
> want
> > >> > to check first as the last time i tried it wouldn't work and my
> > >> > resources would not failover or start
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \
> > >> >         attributes standby="off"
> > >> > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \
> > >> >         attributes standby="off"
> > >> > primitive cweb ocf:heartbeat:apache \
> > >> >         params configfile="/etc/httpd/conf/httpd.conf" \
> > >> >         op monitor interval="60s" \
> > >> >         meta target-role="Started"
> > >> > primitive failoverip ocf:heartbeat:IPaddr \
> > >> >         params ip="10.100.1.100" cidr_netmask="255.255.0.0" \
> > >> >         op monitor interval="30s" \
> > >> >         meta target-role="Started"
> > >> > location cli-prefer-cweb cweb \
> > >> >         rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1
> > >> > location prefer-crhnode1 crhweb 50: crhnode1
> > >> > colocation cweb-with-failoverip inf: cweb failoverip
> > >> > order crhweb-after-failoverip inf: failoverip cweb
> > >> > property $id="cib-bootstrap-options" \
> > >> >         dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3"
> \
> > >> >         cluster-infrastructure="Heartbeat" \
> > >> >         stonith-enabled="false" \
> > >> >         no-quorum-policy="ignore"
> > >> > rsc_defaults $id="rsc-options" \
> > >> >         resource-stickiness="100"
> > >>
> > >> _______________________________________________
> > >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >>
> > >> Project Home: http://www.clusterlabs.org
> > >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > >> Bugs:
> > >>
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake
> > >>r
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110201/6c08e472/attachment-0001.html>