[Pacemaker] resource priority

Wed Jan 29 13:02:19 EST 2014

2014.01.29. 17:52 keltezéssel, "Tomcsányi, Domonkos" írta:
> 2013.11.09. 1:00 keltezéssel, Dennis Jacobfeuerborn írta:
>> On 09.11.2013 00:15, Dennis Jacobfeuerborn wrote:
>>> Hi,
>>> I'm finally moving forward with creating a redundant gateway System for
>>> a network but I'm running into trouble. This is the configuration that
>>> I'm using:
>>>
>>> node gw01 \
>>>      attributes standby="off"
>>> node gw02 \
>>>      attributes standby="off"
>>> primitive p_ip_gw_ext ocf:heartbeat:IPaddr2 \
>>>      params ip="192.168.100.132" cidr_netmask="29" nic="eth0" \
>>>      op monitor interval="10s"
>>> primitive p_ip_gw_int ocf:heartbeat:IPaddr2 \
>>>      params ip="192.168.214.4" cidr_netmask="24" nic="eth1" \
>>>      op monitor interval="10s"
>>> primitive p_route_ext ocf:heartbeat:Route \
>>>      params destination="default" device="eth0" 
>>> gateway="192.168.100.129" \
>>>      op monitor interval="10" timeout="20" depth="0"
>>> primitive p_route_int ocf:heartbeat:Route \
>>>      params destination="default" device="eth1" 
>>> gateway="192.168.214.1" \
>>>      op monitor interval="10" timeout="20" depth="0"
>>> group g_gateway p_ip_gw_ext p_ip_gw_int
>>> colocation c_route_ext -inf: p_route_ext p_ip_gw_ext
>>> colocation c_routes -inf: p_route_ext p_route_int
>>> property $id="cib-bootstrap-options" \
>>>      dc-version="1.1.10-1.el6_4.4-368c726" \
>>>      cluster-infrastructure="cman" \
>>>      stonith-enabled="false" \
>>>      no-quorum-policy="ignore" \
>>>      last-lrm-refresh="1383949342"
>>>
>>> The setup is fairly simple. One IP on the public interface, one IP on
>>> the private interface. On the active system the default route is
>>> configured through the public interface, on the secondary system the
>>> default route is configured through the private interface.
>>>
>>> The problem is that when I put the active node on standby the
>>> p_route_int resource stays on the secondary system and the p_route_ext
>>> resource gets stopped.
>>>
>>> My interpretation is that since there is no explicit priority defined
>>> for either route and with only one node online only one route can be
>>> placed that pacemaker decides arbitrarily to keep p_route_int online 
>>> and
>>> take p_route_ext offline.
>>>
>>> What I really want to express is that p_route_ext should always be
>>> placed first and p_route_int only be placed if possible (i.e. forced
>>> migration) but if not should be taken offline instead (i.e. a node is
>>> down).
>>> Any ideas on how to accomplish this?
>>
>> After sending the mail it occured to me that "place one route not on 
>> the same node as the other" is a wrong way to look at this. Instead I 
>> did this:
>>
>> colocation c_route_ext inf: p_route_ext p_ip_gw_ext
>> colocation c_route_int -inf: p_route_int p_ip_gw_ext
>>
>> i.e. place the ext route with the ext ip and the int route on a node 
>> where the ext ip is *not* running. That way the routes no longer 
>> depend on each other and the priority of the routes no longer matters.
>>
>> Sorry for the noise.
>>
>> Regards,
>>   Dennis
>>
> Hi Dennis & list,
>
> I am having a very similar problem with my cluster, but I am not sure 
> if your solution is truly good, so let me tell you about what I just 
> experienced:
>
> I have two resource groups, one is responsible for having a VPN + 
> external IP + internal interface IP for routing traffic from other 
> machines through the VPN tunnel and the other one is just a single 
> routing resource which routes the other node's traffic through the VPN 
> tunnel.
> If everything is fine then node1 has the VPN tunnel up and node2 has 
> the routing resource, meaning it sends all traffic through VPN on 
> node1. This is achieved by saying:
>
> colocation routes -inf: vpn_resource_group route_resource_group
>
> If both nodes are up and running then everything is fine, but today I 
> stopped node2 (which was running the route_resource_group) and to my 
> surprise (well, for the first 30 seconds before I realized what 
> probably happened) the VPN-service group stopped too. According to my 
> theory this is what happened: Corosync saw that the 
> route_resource_group isn't running anywhere, so it tried to start it 
> on node1, but the colocation rule told it not to do so, ergo the 
> route_resource_group went to 'failed' state, but since it is in 
> colocation with the vpn_resource_group the vpn_resource_group failed 
> too, bringing the whole cluster and all my services down.
>
> So I am wondering how one could tell corosync that if there is no way 
> to run resource2 (route_resource_group) then let that stop but keep 
> running resource1 (vpn_resource_group).
>
> Thank you a lot!
> Domonkos
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
And as it happened with Dennis just after I sent this mail I found out 
the solution: the order of the resource groups in the colocation 
constraint was wrong, so my routes_resource_group had top priority 
meaning after I shut down node2 node1 started routes_resource_group 
instead of keeping the vpn_resource_group running.

I am sorry that I bothered you all.

Domonkos