[ClusterLabs] Users Digest, Vol 44, Issue 11
Ken Gaillot
kgaillot at redhat.com
Thu Jul 2 10:50:40 EDT 2020
LOL, somehow I clicked on an ancient message in my list folder ... well
the advice stands if anyone has a similar issue ;)
I plead a migraine, they make me miss little details like dates ...
On Thu, 2020-07-02 at 09:45 -0500, Ken Gaillot wrote:
> On Thu, 2018-09-06 at 00:59 +0000, Jeffrey Westgate wrote:
> > Greetings from a confused user;
> >
> > We are running pacemaker as part of a load-balanced cluster of two
> > members, both VMWare VMs, with both acting as stepping-stones to
> > our
> > DNS recursive resolvers (RR). Simple use - the /etc/resolver.conf
> > on the *NIX boxes points at both IPs, and the cluster forwards to
> > one
> > of multiple RRs for DNS resolution.
>
> I'm not sure about your specific issue, but generally it's a bad idea
> to round-robin DNS servers due to TTL/caching issues. The client
> should
> know it's contacting the same server at the same IP each time, to
> have
> a correct idea of how long entries can be cached.
>
> My personal preferred HA approach for DNS is:
>
> * Put the DNS servers in containers or VMs that are the pacemaker
> resources, each bound to a specific floating IP (even better, make
> the
> container a bundle, or the VM a guest node, to run the DNS server as
> a
> resource inside it for monitoring/restarting purposes)
>
> * List the floating IPs as multiple DNS servers on the client side
> (whether static like resolver.conf or DHCP) (this is for resolvers,
> you
> could do the same for domain servers by listing them as multiple NS
> records for the domains)
>
> > Today, for an as-yet undetermined reason, one of the two members
> > started failing to connect to the RRs. Intermittently. And quite
> > annoyingly, as this has affected data center operations. No matter
> > what we've tried, one member fails intermittently, the other is
> > fine.
> > And we've tried -
> > - reboot of the affected member - it came back up clean and fine,
> > but the issue remained.
> > - fail the cluster, moving both IPs to the second member server;
> > failover was successful, problem remained.
> > -- this moved the entire cluster to a different VM on a different
> > VMWare host server, so different NIC, etc...
> > - failed the cluster back to the original server; both IPs appears
> > on
> > the 'suspect' VM, and the problem remained
> > - restore the cluster; both IPs are on the proper VMs, but the one
> > still fails intermittently while the second just chugs along.
>
> Sounds networking related ... could something else on the network be
> claiming that IP? Or something wrong with the switch?
>
> > Any ideas what could be causing this? Is this something that could
> > be caused by the cluster config? Anybody ever seen anything
> > similar?
> >
> > Our current unsustainable workaround is to remove the IP for the
> > affected member from the *NIX resolver.conf file.
> >
> > I appreciate any reasonable suggestions. (I am not the creator of
> > the cluster, just the guy trying o figure it out. Unfortunately the
> > creator and my mentor is dearly departed and, in times like this,
> > sorely missed.)
>
> My condolences ...
>
> > Any replies will be read and responded to early tomorrow
> > AM. thanks
> > for understanding.
> > --
> > Jeff Westgate
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list