[Pacemaker] ping directive configuration

Wed Feb 9 04:18:46 EST 2011

Hi,

On Tue, Feb 1, 2011 at 6:55 PM, paul harford <harfordmeister at gmail.com>wrote:

> Hi Again :-)
>
> I think my main problem is my location configuration when i bring down eth0
> on node1 the and looking at crm_m -f the count on node 2 never increases
>
> Could anyone help me out with the pingd / location restraints required for
> a group of resources to failover from node1 to node 2 if the node1 can no
> longer ping the default gateway ?
>

Don't use pingd, use ocf:pacemaker:ping.
Here's a working config:
primitive ping_the_gw ocf:pacemaker:ping \
    params host_list="1.2.3.4" multiplier="100" name="ping_the_gw" \
    op monitor interval="5s" timeout="60s" \
    op start interval="0s" timeout="60s" \
    op stop interval="0s"
clone ping_the_gw_clone ping_the_gw \
    meta globally-unique="false"
location nok_ping_the_gw grouped_resources \
    rule $id="nok_ping_the_gw-rule" -inf: not_defined ping_the_gw or
ping_the_gw lte 0
group grouped_resources virtual_ip fs_mysql httpd mysqld

The "grouped_resources" group will not be allowed to run on a node if the
ping_the_gw resource is not defined on that node or that node cannot ping
the gateway.

In your config you should change
location web_location crhweb \
         rule $id="web_location-rule" -inf: not_defined pingd or pingd lte 0
to
location web_location crhweb \
         rule $id="web_location-rule" -inf: not_defined MYPING or MYPING lte
0
and
primitive MYPING ocf:pacemaker:ping \
         params host_list="10.100.0.254" multiplier="1000" \
         op monitor interval="15s" timeout="20s" \
         op start interval="0" timeout="90s" \
         op stop interval="0" timeout="100s"
to
primitive MYPING ocf:pacemaker:ping \
         params host_list="10.100.0.254" multiplier="1000" \
         op monitor interval="15s" timeout="20s" \
         op start interval="0" timeout="90s" \
         op stop interval="0" timeout="100s"

Regards,
Dan

>
> Thanks
> again
>
>
>
>
> On 1 February 2011 13:08, paul harford <harfordmeister at gmail.com> wrote:
>
>> Hi Nikita
>> Sorry i fogot i have 2 ethernet interfaces eth 1 is for the heartbeat and
>> eth 0 is for the public ip and the virtual ip for apache is 10.100.1.100
>>
>> Thanks
>> Paul
>>
>>
>> On 1 February 2011 12:04, Nikita Michalko <michalko.system at a-i-p.com>wrote:
>>
>>> Hi Paul!
>>>
>>> Can you show me your ha.cf?
>>> How many network  interfaces do you use for this cluster?
>>> If only one, it is the typical split-brain situation after cable pull
>>> down!
>>>
>>> Nikita
>>>
>>>
>>> Am Dienstag, 1. Februar 2011 12:05 schrieb paul harford:
>>>  > Hi NIkita
>>> > I reverted to an early snapshot and started again i now have ping d
>>> running
>>> > but when i remove the eth0 the resource does not failover
>>> >
>>> > i can see in the ha-log that the ping detects the network is gone but
>>> it
>>> > does not move the resource. Can anyone see the error in my config?
>>> >
>>> >
>>> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" node1 \
>>> >         attributes standby="off"
>>> > node $id="59440607-2a5c-450e-84fa-94bf69742671" node2 \
>>> >         attributes standby="off"
>>> > primitive MYPING ocf:pacemaker:pingd \
>>> >         params host_list="10.100.0.254" multiplier="1000" \
>>> >         op monitor interval="15s" timeout="20s" \
>>> >         op start interval="0" timeout="90s" \
>>> >         op stop interval="0" timeout="100s"
>>> > primitive crhweb ocf:heartbeat:apache \
>>> >         params configfile="/etc/httpd/conf/httpd.conf" \
>>> >         op monitor interval="60s" \
>>> >         meta target-role="Started"
>>> > primitive failoverip ocf:heartbeat:IPaddr \
>>> >         params ip="10.100.1.100" cidr_netmask="255.255.0.0" \
>>> >         op monitor interval="30s"
>>> > clone MYPINGCLONE MYPING \
>>> >         meta globally-unique="false"
>>> > location web_location crhweb \
>>> >         rule $id="web_location-rule" -inf: not_defined pingd or pingd
>>> lte 0
>>> > colocation crhweb-with-failoverip inf: crhweb failoverip
>>> > order crhweb-after-failoverip inf: MYPINGCLONE failoverip crhweb
>>> > property $id="cib-bootstrap-options" \
>>> >         dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>>> >         cluster-infrastructure="Heartbeat" \
>>> >         stonith-enabled="false" \
>>> >         no-quorum-policy="ignore"
>>> > rsc_defaults $id="rsc-options" \
>>> >         resource-stickiness="100"
>>> >
>>> >
>>> > HA_LOG
>>> >
>>> > Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: glib: Error sending
>>> packet:
>>> > Network is unreachable
>>> > Jan 28 11:17:42 node1 heartbeat: [2872]: info: glib: euid=0 egid=0
>>> > Jan 28 11:17:42 node1 heartbeat: [2872]: ERROR: write_child: write
>>> failure
>>> > on ping 10.100.0.254.: Network is unreachable
>>> > Jan 28 11:17:43 node1 pingd: [6004]: WARN: ping_write: Wrote -1 of 39
>>> > chars: Network is unreachable (101
>>> >
>>> > On 1 February 2011 09:35, paul harford <harfordmeister at gmail.com>
>>> wrote:
>>> > > Hi NIkita
>>> > > Many thanks for your assistance, i updated the changes you noticed
>>> but
>>> > > now my 2 nodes just keep rebooting, did i enter something incorrectly
>>> in
>>> > > the pingd directive ?
>>> > >
>>> > > Paul
>>> > >
>>> > >
>>> > > i can see these errors in the messages log and my configuration is
>>> below
>>> > >
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: clone_print:  Clone
>>> > > Set: connected
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: short_print:
>>> > > Stopped: [ pingd:0 pingd:1 ]
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: rsc_merge_weights:
>>> > > failoverip: Rolling back scores from crhweb
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: native_color:
>>> Resource
>>> > > crhweb cannot run anywhere
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp:  Start
>>> > > recurring monitor (10s) for pingd:0 on crhnode2
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation
>>> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not
>>> use
>>> > > the same (name, interval) combination more than once per resource
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation
>>> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not
>>> use
>>> > > the same (name, interval) combination more than once per resource
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: RecurringOp:  Start
>>> > > recurring monitor (10s) for pingd:1 on crhnode1
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation
>>> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not
>>> use
>>> > > the same (name, interval) combination more than once per resource
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Operation
>>> > > pingd-monitor-5s-0 is a duplicate of pingd-monitor-5s
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: ERROR: is_op_dup: Do not
>>> use
>>> > > the same (name, interval) combination more than once per resource
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Leave
>>> > > resource failoverip (Started crhnode1)
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Stop
>>> > > resource crhweb      (crhnode1)
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start
>>> > > pingd:0     (crhnode2)
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: notice: LogActions: Start
>>> > > pingd:1     (crhnode1)
>>> > > Feb  1 09:01:06 crhnode2 crmd: [3742]: info: do_state_transition:
>>> State
>>> > > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
>>> input=I_PE_SUCCESS
>>> > > cause=C_IPC_MESSAGE origin=handle_response ]
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message:
>>> > > Transition 59: PEngine Input stored in:
>>> /var/lib/pengine/pe-input-82.bz2
>>> > > Feb  1 09:01:06 crhnode2 crmd: [3742]: info: unpack_graph: Unpacked
>>> > > transition 59: 14 actions in 14 synapses
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message:
>>> > > Configuration ERRORs found during PE processing.  Please run
>>> "crm_verify
>>> > > -L" to identify issues.
>>> > >
>>> > >
>>> > >
>>> > > here is my current configuration
>>> > >
>>> > >
>>> > > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \
>>> > >         attributes standby="off"
>>> > > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \
>>> > >         attributes standby="off"
>>> > > primitive crhweb ocf:heartbeat:apache \
>>> > >
>>> > >         params configfile="/etc/httpd/conf/httpd.conf" \
>>> > >         op monitor interval="60s" \
>>> > >         meta target-role="Started"
>>> > > primitive failoverip ocf:heartbeat:IPaddr \
>>> > >         params ip="10.100.1.100" cidr_netmask="255.255.0.0" \
>>> > >         op monitor interval="30s" \
>>> > >         meta target-role="Started"
>>> > > primitive pingd ocf:pacemaker:pingd \
>>> > >         params dampen="5s" host_list="10.100.0.254" multiplier="1000"
>>> > > name="pingval" \
>>> > >         operations $id="pingd-operations" \
>>> > >         op monitor interval="10s" timeout="20s" \
>>> > >         op monitor interval="90s" timeout="25s" start \
>>> > >         op monitor interval="100s" timeout="25s" stop
>>> > > clone connected pingd \
>>> > >
>>> > >         meta globally-unique="false" target-role="started"
>>> > > location cli-prefer-crhweb crhweb \
>>> > >
>>> > >         rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1
>>> > > location crhweb_on_connected_node crhweb \
>>> > >         rule $id="crhweb_on_connected_node-rule" -inf: not_defined
>>> > > pingval or pingval lte 0
>>> > >
>>> > > location prefer-crhnode1 crhweb 50: crhnode1
>>> > > colocation crhweb-with-failoverip inf: crhweb failoverip
>>> > > order crhweb-after-failoverip inf: pingd failoverip crhweb
>>> > >
>>> > > property $id="cib-bootstrap-options" \
>>> > >         dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3"
>>> \
>>> > >         cluster-infrastructure="Heartbeat" \
>>> > >         stonith-enabled="false" \
>>> > >         no-quorum-policy="ignore"
>>> > >
>>> > > On 1 February 2011 07:21, Nikita Michalko
>>> <michalko.system at a-i-p.com>wrote:
>>> > >> Hi Paul,
>>> > >>
>>> > >> see below!
>>> > >>
>>> > >> Am Montag, 31. Januar 2011 19:55 schrieb paul harford:
>>> > >> > HI guys
>>> > >> > i'm having some issues with a ping directive, my current config is
>>> > >> > below and basically i want the web resource to failover to the
>>> second
>>> > >> > node if
>>> > >>
>>> > >> the
>>> > >>
>>> > >> > ping can no longer contact the default gateway
>>> > >> >
>>> > >> > so here goes
>>> > >> >
>>> > >> > crm configure primitive ping ocf:pacemaker:ping params dampen=5s
>>> > >> > host_list=(default GateWay) multplier=1000 name=pingval operations
>>> > >> > $id=ping-operations op moinitor interval=10s timeout=15s
>>> > >>
>>> > >>  - this is surely wrong: "moinitor" ?
>>> > >>  - no such primitive (ping) below ...
>>> > >>
>>> > >> HTH
>>> > >>
>>> > >> Nikita Michalko
>>> > >>
>>> > >> > and
>>> > >> >
>>> > >> > crm configure clone connected ping meta globally-unique=false
>>> > >> > target-role=started
>>> > >> >
>>> > >> > and
>>> > >> >
>>> > >> > location web_on_connected_node cweb rule
>>> > >> > $id=web_on_connected_node-rule -inf: not_defined pingval or
>>> pingval
>>> > >> > lte 0
>>> > >> >
>>> > >> >
>>> > >> > Does anyone see any isssues's whith the above confiuguration ? i
>>> want
>>> > >> > to check first as the last time i tried it wouldn't work and my
>>> > >> > resources would not failover or start
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \
>>> > >> >         attributes standby="off"
>>> > >> > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \
>>> > >> >         attributes standby="off"
>>> > >> > primitive cweb ocf:heartbeat:apache \
>>> > >> >         params configfile="/etc/httpd/conf/httpd.conf" \
>>> > >> >         op monitor interval="60s" \
>>> > >> >         meta target-role="Started"
>>> > >> > primitive failoverip ocf:heartbeat:IPaddr \
>>> > >> >         params ip="10.100.1.100" cidr_netmask="255.255.0.0" \
>>> > >> >         op monitor interval="30s" \
>>> > >> >         meta target-role="Started"
>>> > >> > location cli-prefer-cweb cweb \
>>> > >> >         rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1
>>> > >> > location prefer-crhnode1 crhweb 50: crhnode1
>>> > >> > colocation cweb-with-failoverip inf: cweb failoverip
>>> > >> > order crhweb-after-failoverip inf: failoverip cweb
>>> > >> > property $id="cib-bootstrap-options" \
>>> > >> >
>>> dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>>> > >> >         cluster-infrastructure="Heartbeat" \
>>> > >> >         stonith-enabled="false" \
>>> > >> >         no-quorum-policy="ignore"
>>> > >> > rsc_defaults $id="rsc-options" \
>>> > >> >         resource-stickiness="100"
>>> > >>
>>> > >> _______________________________________________
>>> > >> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> > >>
>>> > >> Project Home: http://www.clusterlabs.org
>>> > >> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> > >> Bugs:
>>> > >>
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake
>>> > >>r
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>>
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

-- 
Dan Frincu
CCNA, RHCE
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20110209/b0d26f86/attachment-0001.html>