[Pacemaker] how to test network access and fail over accordingly?

Thu Oct 7 15:42:21 EDT 2010

Yesterday, the last few emails between Vadym and I were inadvertently
not posted to this list.  Here are those posts for anyone having
similar issues.

Regards,
Craig.

On 7 October 2010 15:20, Vadym Chepkov <vchepkov at gmail.com> wrote:
> no, default is 0 - it is not taken into consideration at all.
> Resource stays in place because allocation on the other host has the same score.
> You can see all computed scores using ptest -sL
>
> You don't need to specify $id= , it's redundant, by the way
>
> Vadym
>
> On Oct 6, 2010, at 9:59 PM, Craig Hurley wrote:
>
>> Thanks again and I see what you mean; I unplugged eth0 from both nodes
>> and g_cluster_services went down on both nodes.  I took your advice
>> onboard and read this section:
>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html#id771622
>>
>> ... and I've configured the location rule so that g_cluster_services
>> runs on the node with most connections:
>>
>> primitive p_ping ocf:pacemaker:ping \
>>        params name="p_ping" host_list="172.20.0.254 172.20.50.1
>> 172.20.50.2" multiplier="1000" \
>>        op monitor interval="20s"
>> clone c_ping p_ping \
>>        meta globally-unique="false"
>> location loc_ping g_cluster_services \
>>        rule $id="loc_ping-rule" p_ping: defined p_ping
>>
>> Now if I unplug eth0 from both nodes, g_cluster_services remains up on
>> one of the nodes, this suits my requirements :)
>>
>> One last item: in my config I have not specified a resource
>> stickiness, and the master role and g_cluster_services move around as
>> expected when a node fails, now when a failed node comes back online,
>> the master role and g_cluster_services stay where they are (until the
>> next forced fail over) -- which is the behaviour I require.  Is there
>> a default stickiness that causes this "correct" behaviour?
>>
>> Regards,
>> Craig.
>>
>>
>> On 7 October 2010 11:54, Vadym Chepkov <vchepkov at gmail.com> wrote:
>>> monitor operation is essential for ping RA, otherwise it won't work too
>>>
>>> As for the multiplier - it's all about the score and resource stickiness
>>> with multiplier 200, and resource stickiness set to 500, for example,
>>> when both hosts can ping up to 2 ping nodes they will stay where they are, but if one host can ping 3 ping nodes but another just 2 -
>>> this will make resources to relocate to better connected host.
>>>
>>> In a simple example I gave you, if this is the ip of a router for both nodes and it will go down, this will cause the resource not to failover, but just go down, so if this is not what you want, you would probably ping not just the router, but both nodes IPs as well and only if you able to ping only yourself you would failover:
>>>
>>> location rg0-connected rg0 \
>>> rule -inf: not_defined pingd or pingd lte 200
>>>
>>> Vadym
>>>
>>> On Oct 6, 2010, at 5:56 PM, Craig Hurley wrote:
>>>
>>>> Thanks Vadym, this worked.  It seems the missing name field was
>>>> causing the problem.
>>>>
>>>> On a related note, why do you have a multiplier of 200?
>>>>
>>>> According to http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html,
>>>> the multiplier field is "The number by which to multiply the number of
>>>> connected ping nodes by. Useful when there are multiple ping nodes
>>>> configured."
>>>>
>>>> I don't understand why one would want to multiply the number of
>>>> connected nodes when there are multiple ping nodes :/
>>>>
>>>> Regards,
>>>> Craig.
>>>>
>>>> On 7 October 2010 09:37, Vadym Chepkov <vchepkov at gmail.com> wrote:
>>>>> This is my config that works fine
>>>>>
>>>>> primitive ping ocf:pacemaker:ping \
>>>>>  params name="pingd" host_list="10.10.10.250" multiplier="200" timeout="5" \
>>>>>  op monitor interval="10"
>>>>>
>>>>> clone connected ping \
>>>>>        meta globally-unique="false"
>>>>>
>>>>> location rg0-connected rg0 \
>>>>>  rule -inf: not_defined pingd or pingd lte 0
>>>>>
>>>>>
>>>>> On Oct 6, 2010, at 4:21 PM, Craig Hurley wrote:
>>>>>
>>>>>> I tried using ping instead of pingd and I added "number" to the
>>>>>> evaluation, I get the same results :/
>>>>>>
>>>>>> primitive p_ping ocf:pacemaker:ping params host_list=172.20.0.254
>>>>>> clone c_ping p_ping meta globally-unique=false
>>>>>> location loc_ping g_cluster_services rule -inf: not_defined p_ping or
>>>>>> p_ping number:lte 0
>>>>>>
>>>>>> Regards,
>>>>>> Craig.
>>>>>>
>>>>>>
>>>>>> On 6 October 2010 20:43, Jayakrishnan <jayakrishnanlll at gmail.com> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Guess the change:--
>>>>>>> location loc_pingd g_cluster_services rule -inf: not_defined pingd or pingd
>>>>>>> number:lte 0
>>>>>>>
>>>>>>> should work
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>>
>>>>>>> Jayakrishnan. L
>>>>>>>
>>>>>>> Visit:
>>>>>>> www.foralllinux.blogspot.com
>>>>>>> www.jayakrishnan.bravehost.com
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 6, 2010 at 11:56 AM, Claus Denk <denk at us.es> wrote:
>>>>>>>>
>>>>>>>> I am having a similar problem, so let's wait for the experts, But in the
>>>>>>>> meanwhile, try changing
>>>>>>>>
>>>>>>>>
>>>>>>>> location loc_pingd g_cluster_services rule -inf: not_defined p_pingd
>>>>>>>> or p_pingd lte 0
>>>>>>>>
>>>>>>>> to
>>>>>>>>
>>>>>>>> location loc_pingd g_cluster_services rule -inf: not_defined pingd
>>>>>>>> or pingd number:lte 0
>>>>>>>>
>>>>>>>> and see what happens. As far as I have read, it is also more recommended
>>>>>>>> to use the "ping"
>>>>>>>> resource instead of "pingd"...
>>>>>>>>
>>>>>>>> kind regards, Claus
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/06/2010 05:45 AM, Craig Hurley wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I have a 2 node cluster, running DRBD, heartbeat and pacemaker in
>>>>>>>>> active/passive mode.  On both nodes, eth0 is connected to the main
>>>>>>>>> network, eth1 is used to connect the nodes directly to each other.
>>>>>>>>> The nodes share a virtual IP address on eth0.  Pacemaker is also
>>>>>>>>> controlling a custom service with an LSB compliant script in
>>>>>>>>> /etc/init.d/.  All of this is working fine and I'm happy with it.
>>>>>>>>>
>>>>>>>>> I'd like to configure the nodes so that they fail over if eth0 goes
>>>>>>>>> down (or if they cannot access a particular gateway), so I tried
>>>>>>>>> adding the following (as per
>>>>>>>>> http://www.clusterlabs.org/wiki/Example_configurations#Set_up_pingd)
>>>>>>>>>
>>>>>>>>> primitive p_pingd ocf:pacemaker:pingd params host_list=172.20.0.254 op
>>>>>>>>> monitor interval=15s timeout=5s
>>>>>>>>> clone c_pingd p_pingd meta globally-unique=false
>>>>>>>>> location loc_pingd g_cluster_services rule -inf: not_defined p_pingd
>>>>>>>>> or p_pingd lte 0
>>>>>>>>>
>>>>>>>>> ... but when I do add that, all resource are stopped and they don't
>>>>>>>>> come back up on either node.  Am I making a basic mistake or do you
>>>>>>>>> need more info from me?
>>>>>>>>>
>>>>>>>>> All help is appreciated,
>>>>>>>>> Craig.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> pacemaker
>>>>>>>>> Version: 1.0.8+hg15494-2ubuntu2
>>>>>>>>>
>>>>>>>>> heartbeat
>>>>>>>>> Version: 1:3.0.3-1ubuntu1
>>>>>>>>>
>>>>>>>>> drbd8-utils
>>>>>>>>> Version: 2:8.3.7-1ubuntu2.1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> rp at rpalpha:~$ sudo crm configure show
>>>>>>>>> node $id="32482293-7b0f-466e-b405-c64bcfa2747d" rpalpha
>>>>>>>>> node $id="3f2aac12-05aa-4ac7-b91f-c47fa28efb44" rpbravo
>>>>>>>>> primitive p_drbd_data ocf:linbit:drbd \
>>>>>>>>>         params drbd_resource="data" \
>>>>>>>>>         op monitor interval="30s"
>>>>>>>>> primitive p_fs_data ocf:heartbeat:Filesystem \
>>>>>>>>>         params device="/dev/drbd/by-res/data" directory="/mnt/data"
>>>>>>>>> fstype="ext4"
>>>>>>>>> primitive p_ip ocf:heartbeat:IPaddr2 \
>>>>>>>>>         params ip="172.20.50.3" cidr_netmask="255.255.0.0" nic="eth0" \
>>>>>>>>>         op monitor interval="30s"
>>>>>>>>> primitive p_rp lsb:rp \
>>>>>>>>>         op monitor interval="30s" \
>>>>>>>>>         meta target-role="Started"
>>>>>>>>> group g_cluster_services p_ip p_fs_data p_rp
>>>>>>>>> ms ms_drbd p_drbd_data \
>>>>>>>>>         meta master-max="1" master-node-max="1" clone-max="2"
>>>>>>>>> clone-node-max="1" notify="true"
>>>>>>>>> location loc_preferred_master g_cluster_services inf: rpalpha
>>>>>>>>> colocation colo_mnt_on_master inf: g_cluster_services ms_drbd:Master
>>>>>>>>> order ord_mount_after_drbd inf: ms_drbd:promote g_cluster_services:start
>>>>>>>>> property $id="cib-bootstrap-options" \
>>>>>>>>>         dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
>>>>>>>>>         cluster-infrastructure="Heartbeat" \
>>>>>>>>>         no-quorum-policy="ignore" \
>>>>>>>>>         stonith-enabled="false" \
>>>>>>>>>         expected-quorum-votes="2" \
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> rp at rpalpha:~$ sudo cat /etc/ha.d/ha.cf
>>>>>>>>> node rpalpha
>>>>>>>>> node rpbravo
>>>>>>>>>
>>>>>>>>> keepalive 2
>>>>>>>>> warntime 5
>>>>>>>>> deadtime 15
>>>>>>>>> initdead 60
>>>>>>>>>
>>>>>>>>> mcast eth0 239.0.0.43 694 1 0
>>>>>>>>> bcast eth1
>>>>>>>>>
>>>>>>>>> use_logd yes
>>>>>>>>> autojoin none
>>>>>>>>> crm respawn
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> rp at rpalpha:~$ sudo cat /etc/drbd.conf
>>>>>>>>> global {
>>>>>>>>>         usage-count no;
>>>>>>>>> }
>>>>>>>>> common {
>>>>>>>>>         protocol C;
>>>>>>>>>
>>>>>>>>>         handlers {}
>>>>>>>>>
>>>>>>>>>         startup {}
>>>>>>>>>
>>>>>>>>>         disk {}
>>>>>>>>>
>>>>>>>>>         net {
>>>>>>>>>                 cram-hmac-alg sha1;
>>>>>>>>>                 shared-secret "foobar";
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         syncer {
>>>>>>>>>                 verify-alg sha1;
>>>>>>>>>                 rate 100M;
>>>>>>>>>         }
>>>>>>>>> }
>>>>>>>>> resource data {
>>>>>>>>>         device /dev/drbd0;
>>>>>>>>>         meta-disk internal;
>>>>>>>>>         on rpalpha {
>>>>>>>>>                 disk /dev/mapper/rpalpha-data;
>>>>>>>>>                 address 192.168.1.1:7789;
>>>>>>>>>         }
>>>>>>>>>         on rpbravo {
>>>>>>>>>                 disk /dev/mapper/rpbravo-data;
>>>>>>>>>                 address 192.168.1.2:7789;
>>>>>>>>>         }
>>>>>>>>> }
>>>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>
>>>>>
>>>
>>>
>
>