[ClusterLabs] Antw: Re: Users Digest, Vol 44, Issue 11

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Thu Sep 6 02:14:08 EDT 2018

>>> Jeffrey Westgate <Jeffrey.Westgate at arkansas.gov> schrieb am 06.09.2018 um
in Nachricht
<A36B14FA9AA67F4E836C0EE59DEA89C401B10C90EF at CM-SAS-MBX-07.sas.arkgov.net>:
> Greetings from a confused user;
> We are running pacemaker as part of a load‑balanced cluster of two members,

> both VMWare VMs, with both acting as stepping‑stones to our DNS recursive 
> resolvers (RR).  Simple use  ‑ the /etc/resolver.conf on the *NIX boxes
> at both IPs, and the cluster forwards to one of multiple RRs for DNS 
> resolution.
> Today, for an as‑yet undetermined reason, one of the two members started 
> failing to connect to the RRs. Intermittently. And quite annoyingly, as this

> has affected data center operations.  No matter what we've tried, one member

> fails intermittently, the other is fine.  
> And we've tried ‑ 
>  ‑ reboot of the affected member ‑ it came back up clean and fine, but the 
> issue remained.
>  ‑ fail the cluster, moving both IPs to the second member server; failover 
> was successful, problem remained.
>   ‑‑ this moved the entire cluster to a different VM on a different VMWare 
> host server, so different NIC, etc...
> ‑ failed the cluster back to the original server; both IPs appears on the 
> 'suspect' VM, and the problem remained
> ‑ restore the cluster; both IPs are on the proper VMs, but the one still 
> fails intermittently while the second just chugs along.
> Any ideas what could be causing this?  Is this something that could be 
> caused by the cluster config?  Anybody ever seen anything similar?

I have two suggestions:
1) Inspect your configuration management to see whether recent changes (host
or network) may have caused that. If so, maybe roll back to the previous state
(or roll further forward to fix it).
2) Do some monitoring (or inspecting the results of that) to find out details.
If you have nothing else, syslog may give valuable hints.


> Our current unsustainable workaround is to remove the IP for the affected 
> member from the *NIX resolver.conf file.
> I appreciate any reasonable suggestions.  (I am not the creator of the 
> cluster, just the guy trying o figure it out. Unfortunately the creator and

> my mentor is dearly departed and, in times like this, sorely missed.)
> Any replies will be read and responded to early tomorrow AM.  thanks for 
> understanding.
> ‑‑
> Jeff Westgate
> _______________________________________________
> Users mailing list: Users at clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 

More information about the Users mailing list