[Pacemaker] resource priority

Wed Jan 29 11:52:24 EST 2014

2013.11.09. 1:00 keltezéssel, Dennis Jacobfeuerborn írta:
> On 09.11.2013 00:15, Dennis Jacobfeuerborn wrote:
>> Hi,
>> I'm finally moving forward with creating a redundant gateway System for
>> a network but I'm running into trouble. This is the configuration that
>> I'm using:
>>
>> node gw01 \
>>      attributes standby="off"
>> node gw02 \
>>      attributes standby="off"
>> primitive p_ip_gw_ext ocf:heartbeat:IPaddr2 \
>>      params ip="192.168.100.132" cidr_netmask="29" nic="eth0" \
>>      op monitor interval="10s"
>> primitive p_ip_gw_int ocf:heartbeat:IPaddr2 \
>>      params ip="192.168.214.4" cidr_netmask="24" nic="eth1" \
>>      op monitor interval="10s"
>> primitive p_route_ext ocf:heartbeat:Route \
>>      params destination="default" device="eth0" 
>> gateway="192.168.100.129" \
>>      op monitor interval="10" timeout="20" depth="0"
>> primitive p_route_int ocf:heartbeat:Route \
>>      params destination="default" device="eth1" 
>> gateway="192.168.214.1" \
>>      op monitor interval="10" timeout="20" depth="0"
>> group g_gateway p_ip_gw_ext p_ip_gw_int
>> colocation c_route_ext -inf: p_route_ext p_ip_gw_ext
>> colocation c_routes -inf: p_route_ext p_route_int
>> property $id="cib-bootstrap-options" \
>>      dc-version="1.1.10-1.el6_4.4-368c726" \
>>      cluster-infrastructure="cman" \
>>      stonith-enabled="false" \
>>      no-quorum-policy="ignore" \
>>      last-lrm-refresh="1383949342"
>>
>> The setup is fairly simple. One IP on the public interface, one IP on
>> the private interface. On the active system the default route is
>> configured through the public interface, on the secondary system the
>> default route is configured through the private interface.
>>
>> The problem is that when I put the active node on standby the
>> p_route_int resource stays on the secondary system and the p_route_ext
>> resource gets stopped.
>>
>> My interpretation is that since there is no explicit priority defined
>> for either route and with only one node online only one route can be
>> placed that pacemaker decides arbitrarily to keep p_route_int online and
>> take p_route_ext offline.
>>
>> What I really want to express is that p_route_ext should always be
>> placed first and p_route_int only be placed if possible (i.e. forced
>> migration) but if not should be taken offline instead (i.e. a node is
>> down).
>> Any ideas on how to accomplish this?
>
> After sending the mail it occured to me that "place one route not on 
> the same node as the other" is a wrong way to look at this. Instead I 
> did this:
>
> colocation c_route_ext inf: p_route_ext p_ip_gw_ext
> colocation c_route_int -inf: p_route_int p_ip_gw_ext
>
> i.e. place the ext route with the ext ip and the int route on a node 
> where the ext ip is *not* running. That way the routes no longer 
> depend on each other and the priority of the routes no longer matters.
>
> Sorry for the noise.
>
> Regards,
>   Dennis
>
Hi Dennis & list,

I am having a very similar problem with my cluster, but I am not sure if 
your solution is truly good, so let me tell you about what I just 
experienced:

I have two resource groups, one is responsible for having a VPN + 
external IP + internal interface IP for routing traffic from other 
machines through the VPN tunnel and the other one is just a single 
routing resource which routes the other node's traffic through the VPN 
tunnel.
If everything is fine then node1 has the VPN tunnel up and node2 has the 
routing resource, meaning it sends all traffic through VPN on node1. 
This is achieved by saying:

colocation routes -inf: vpn_resource_group route_resource_group

If both nodes are up and running then everything is fine, but today I 
stopped node2 (which was running the route_resource_group) and to my 
surprise (well, for the first 30 seconds before I realized what probably 
happened) the VPN-service group stopped too. According to my theory this 
is what happened: Corosync saw that the route_resource_group isn't 
running anywhere, so it tried to start it on node1, but the colocation 
rule told it not to do so, ergo the route_resource_group went to 
'failed' state, but since it is in colocation with the 
vpn_resource_group the vpn_resource_group failed too, bringing the whole 
cluster and all my services down.

So I am wondering how one could tell corosync that if there is no way to 
run resource2 (route_resource_group) then let that stop but keep running 
resource1 (vpn_resource_group).

Thank you a lot!
Domonkos