[ClusterLabs] Two-Node Failover IP-Address and Gateway

Andrei Borzenkov arvidjaar at gmail.com
Tue Jan 23 01:31:46 EST 2018


On Mon, Jan 22, 2018 at 10:09 PM, brainheadz <brainheadz at gmail.com> wrote:
> Hello Andrei,
>
> yes this fixes the issue. But is there a way to automate this process
> without a manual intervention?
>

Normally adding and removing this constraint is manual process by
design. Do you mean this constraint appears again without you being
aware of it?

> Node1 fails.
>
> Node2 takes over the vip_bad and ipsrcaddr.
>
> Node1 is back online.
>
> vip_bad and ipsrcaddr are moved back to Node1.
>
> Node2 sets the correct default_gw and it's own source address again
> (configured via ip_bad_2 and vip_bad_2_location).
> ^- this happens if i execute the cleanup manually
>
> # crm resource cleanup default_gw_clone
> Cleaning up default_gw:0 on fw-managed-01, removing fail-count-default_gw
> Cleaning up default_gw:0 on fw-managed-02, removing fail-count-default_gw
> Waiting for 2 replies from the CRMd.. OK
>
> # crm status
> Last updated: Mon Jan 22 19:43:22 2018          Last change: Mon Jan 22
> 19:43:17 2018 by hacluster via crmd on fw-managed-01
> Stack: corosync
> Current DC: fw-managed-01 (version 1.1.14-70404b0) - partition with quorum
> 2 nodes and 6 resources configured
>
> Online: [ fw-managed-01 fw-managed-02 ]
>
> Full list of resources:
>
>  vip_managed    (ocf::heartbeat:IPaddr2):       Started fw-managed-01
>  vip_bad        (ocf::heartbeat:IPaddr2):       Started fw-managed-01
>  Clone Set: default_gw_clone [default_gw]
>      Started: [ fw-managed-01 fw-managed-02 ]
>  src_address    (ocf::heartbeat:IPsrcaddr):     Started fw-managed-01
>  vip_bad_2      (ocf::heartbeat:IPaddr2):       Started fw-managed-02
>
> Failed Actions:
> * src_address_monitor_0 on fw-managed-02 'unknown error' (1): call=18,
> status=complete, exitreason='[/usr/lib/heartbeat/findif -C] failed',
>     last-rc-change='Fri Jan 19 17:10:43 2018', queued=0ms, exec=75ms
>
> root at fw-managed-02:~# ip r
> default via 100.200.123.161 dev bad
> 100.200.123.160/29 dev bad  proto kernel  scope link  src 100.200.123.165
> 172.18.0.0/16 dev tun0  proto kernel  scope link  src 172.18.0.1
> 172.30.40.0/24 dev managed  proto kernel  scope link  src 172.30.40.252
> root at fw-managed-02:~# ping 8.8.8.8
> PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
> 64 bytes from 8.8.8.8: icmp_seq=1 ttl=60 time=3.57 ms
> ^C
>
> On Mon, Jan 22, 2018 at 7:29 PM, Andrei Borzenkov <arvidjaar at gmail.com>
> wrote:
>>
>> 22.01.2018 20:54, brainheadz пишет:
>> > Hello,
>> >
>> > I've got 2 public IP's and 2 Hosts.
>> >
>> > Each IP is assigned to one host. The interfaces are not configured by
>> > the
>> > system, I am using pacemaker to do this.
>> >
>> > fw-managed-01: 100.200.123.166/29
>> > fw-managed-02: 100.200.123.165/29
>> >
>> > gateway: 100.200.123.161
>> >
>> > I am trying to get some form of active/passive cluster. fw-managed-01 is
>> > the active node. If it fails, fw-managed-02 has to take over the VIP and
>> > change it's IPsrcaddr. This works so far. But if fw-managed-01 comes
>> > back
>> > online, the default Gateway isn't set again on the node fw-managed-02.
>> >
>> > I'm quite new to this topic. The Cluster would work that way, but the
>> > passive Node can never reach the internet cause of the missing default
>> > gateway.
>> >
>> > Can anyone explain to what I am missing or doing wrong here?
>> >
>> > Output
>> >
>> > # crm configure show
>> > node 1: fw-managed-01
>> > node 2: fw-managed-02
>> > primitive default_gw Route \
>> >         op monitor interval=10s \
>> >         params destination=default device=bad gateway=100.200.123.161
>> > primitive src_address IPsrcaddr \
>> >         op monitor interval=10s \
>> >         params ipaddress=100.200.123.166
>> > primitive vip_bad IPaddr2 \
>> >         op monitor interval=10s \
>> >         params nic=bad ip=100.200.123.166 cidr_netmask=29
>> > primitive vip_bad_2 IPaddr2 \
>> >         op monitor interval=10s \
>> >         params nic=bad ip=100.200.123.165 cidr_netmask=29
>> > primitive vip_managed IPaddr2 \
>> >         op monitor interval=10s \
>> >         params ip=172.30.40.254 cidr_netmask=24
>> > clone default_gw_clone default_gw \
>> >         meta clone-max=2 target-role=Started
>> > location cli-prefer-default_gw default_gw_clone role=Started inf:
>> > fw-managed-01
>>
>> As far as I can tell this restricts clone to one node only. As it starts
>> with cli- this was done using something like "crm resource move" or
>> similar. Try
>>
>> crm resource clear default_gw_clone
>>
>> > location src_address_location src_address inf: fw-managed-01
>> > location vip_bad_2_location vip_bad_2 inf: fw-managed-02
>> > location vip_bad_location vip_bad inf: fw-managed-01
>> > order vip_before_default_gw inf: vip_bad:start src_address:start
>> > symmetrical=true
>> > location vip_managed_location vip_managed inf: fw-managed-01
>> > property cib-bootstrap-options: \
>> >         have-watchdog=false \
>> >         dc-version=1.1.14-70404b0 \
>> >         cluster-infrastructure=corosync \
>> >         cluster-name=debian \
>> >         stonith-enabled=false \
>> >         no-quorum-policy=ignore \
>> >         last-lrm-refresh=1516362207 \
>> >         start-failure-is-fatal=false
>> >
>> > # crm status
>> > Last updated: Mon Jan 22 18:47:12 2018          Last change: Fri Jan 19
>> > 17:04:12 2018 by root via cibadmin on fw-managed-01
>> > Stack: corosync
>> > Current DC: fw-managed-01 (version 1.1.14-70404b0) - partition with
>> > quorum
>> > 2 nodes and 6 resources configured
>> >
>> > Online: [ fw-managed-01 fw-managed-02 ]
>> >
>> > Full list of resources:
>> >
>> >  vip_managed    (ocf::heartbeat:IPaddr2):       Started fw-managed-01
>> >  vip_bad        (ocf::heartbeat:IPaddr2):       Started fw-managed-01
>> >  Clone Set: default_gw_clone [default_gw]
>> >      default_gw (ocf::heartbeat:Route): FAILED fw-managed-02 (unmanaged)
>> >      Started: [ fw-managed-01 ]
>> >  src_address    (ocf::heartbeat:IPsrcaddr):     Started fw-managed-01
>> >  vip_bad_2      (ocf::heartbeat:IPaddr2):       Started fw-managed-02
>> >
>> > Failed Actions:
>> > * default_gw_stop_0 on fw-managed-02 'not installed' (5): call=26,
>> > status=complete, exitreason='Gateway address 100.200.123.161 is
>> > unreachable.',
>> >     last-rc-change='Fri Jan 19 17:10:43 2018', queued=0ms, exec=31ms
>> > * src_address_monitor_0 on fw-managed-02 'unknown error' (1): call=18,
>> > status=complete, exitreason='[/usr/lib/heartbeat/findif -C] failed',
>> >     last-rc-change='Fri Jan 19 17:10:43 2018', queued=0ms, exec=75ms
>> >
>> >
>> > best regards,
>> > Tobias
>> >
>> >
>> >
>> > _______________________________________________
>> > Users mailing list: Users at clusterlabs.org
>> > http://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>>
>>
>> _______________________________________________
>> Users mailing list: Users at clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>




More information about the Users mailing list