[ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)
Jason Gauthier
jagauthier at gmail.com
Fri Jun 29 16:37:46 EDT 2018
On Thu, Jun 21, 2018 at 10:47 AM Jason Gauthier <jagauthier at gmail.com> wrote:
>
> On Thu, Jun 21, 2018 at 9:49 AM Jan Pokorný <jpokorny at redhat.com> wrote:
> >
> > On 21/06/18 07:05 -0400, Jason Gauthier wrote:
> > > On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield <ccaulfie at redhat.com> wrote:
> > >> On 19/06/18 18:47, Jason Gauthier wrote:
> > >>> Attached!
> > >>
> > >> That's very odd. I can see communication with the server and corosync in
> > >> there (do it's doing something) but no logging at all. When I start
> > >> qdevice on my systems it logs loads of messages even if it doesn't
> > >> manage to contact the server. Do you have any logging entries in
> > >> corosync.conf that might be stopping it?
> > >
> > > I haven't checked the corosync logs for any entries before, but I just
> > > did. There isn't anything logged.
> >
> > What about syslog entries (may boil down to /var/log/messages,
> > journald log, or whatever sink is configured)?
>
> I took a look, since both you and Chrissie mentioned that.
>
> There aren't any new entries added to any of the /var/log files.
>
> # corosync-qdevice -f -d
> # date
> Thu Jun 21 10:36:06 EDT 2018
>
> # ls -lt|head
> total 152072
> -rw-r----- 1 root adm 68018 Jun 21 10:34 auth.log
> -rw-rw-r-- 1 root utmp 18704352 Jun 21 10:34 lastlog
> -rw-rw-r-- 1 root utmp 107136 Jun 21 10:34 wtmp
> -rw-r----- 1 root adm 248444 Jun 21 10:34 daemon.log
> -rw-r----- 1 root adm 160899 Jun 21 10:34 syslog
> -rw-r----- 1 root adm 1119856 Jun 21 09:46 kern.log
>
> I did look through daemon, messages, and syslog just to be sure.
>
> > >> Where did the binary come from? did you build it yourself or is it from
> > >> a package? I wonder if it's got corrupted or is a bad version. Possibly
> > >> linked against a 'dodgy' libqb - there have been some things going on
> > >> there that could cause logging to go missing in some circumstances.
> > >>
> > >> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit
> > >> here anyway!
> > >
> > > Hmm. Interesting. I installed the debian package. When it didn't
> > > work, I grabbed the source from github. They both act the same way,
> > > but if there is an underlying library issue then that will continue to
> > > be a problem.
> > >
> > > It doesn't say much:
> > > /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1
> >
> > You are likely using libqb v1.0.1.
>
> Correct. I didn't even think to look at the output of dpkg -l for the
> package version.
> Debian 9 also packages binutils-2.28
>
> > Ability to figure out the proper package version is one of the most
> > basic skills to provide useful diagnostics about the issues with
> > distro-provided packages.
> >
> > With Debian, the proper incantation seems to be
> >
> > dpkg -s libqb-dev | grep -i version
> >
> > or
> >
> > apt list libqb-dev
> >
> > (or substitute libqb0 for libqb-dev).
> >
> > As Chrissie mentioned, there is some fishiness possible if you happen
> > to use ld linker from binutils 2.29+ for the building with this old
> > libqb in the mix, so if the issues persist and logging seems to be
> > missing, try recompiling with the downgraded binutils package below
> > said breakage point.
>
> Since the system already has a lower numbered binutils (2.28) I wonder
> if I should attempt to build a newer version of the libqb library.
>
> As Chrissie mentioned, I will open a bug with Debian in the Interim.
> But I don 't believe I will see resolution to that any time soon. :)
I was finally able to look at this problem again, and found that qnetd
is giving me some messaging, but I don't know what to do with it.
Jun 29 16:34:35 debug New client connected
Jun 29 16:34:35 debug cluster name = zeta
Jun 29 16:34:35 debug tls started = 1
Jun 29 16:34:35 debug tls peer certificate verified = 1
Jun 29 16:34:35 debug node_id = 1084772368
Jun 29 16:34:35 debug pointer = 0x563afd609d70
Jun 29 16:34:35 debug addr_str = ::ffff:192.168.80.16:38010
Jun 29 16:34:35 debug ring id = (40a85010.89ec)
Jun 29 16:34:35 debug cluster dump:
Jun 29 16:34:35 debug client = ::ffff:192.168.80.16:38010,
node_id = 1084772368
Jun 29 16:34:35 debug Client ::ffff:192.168.80.16:38010 (cluster
zeta, node_id 1084772368) sent initial node list.
Jun 29 16:34:35 debug msg seq num 4
Jun 29 16:34:35 debug node list:
Jun 29 16:34:35 error ffsplit: Received empty config node list for
client ::ffff:192.168.80.16:38010
Jun 29 16:34:35 error Algorithm returned error code. Sending error reply.
Jun 29 16:34:35 debug Client ::ffff:192.168.80.16:38010 (cluster
zeta, node_id 1084772368) sent membership node list.
Jun 29 16:34:35 debug msg seq num 5
Jun 29 16:34:35 debug ring id = (40a85010.89ec)
Jun 29 16:34:35 debug node list:
Jun 29 16:34:35 debug node_id = 1084772368, data_center_id = 0,
node_state = not set
Jun 29 16:34:35 debug node_id = 1084772369, data_center_id = 0,
node_state = not set
Jun 29 16:34:35 debug Algorithm result vote is Ask later
Jun 29 16:34:35 debug Client ::ffff:192.168.80.16:38010 (cluster
zeta, node_id 1084772368) sent quorum node list.
Jun 29 16:34:35 debug msg seq num 6
Jun 29 16:34:35 debug quorate = 1
Jun 29 16:34:35 debug node list:
Jun 29 16:34:35 debug node_id = 1084772368, data_center_id = 0,
node_state = member
Jun 29 16:34:35 debug node_id = 1084772369, data_center_id = 0,
node_state = member
It looks like "config node list" is empty, but the other lists are
not. I'm not sure where it's getting that node list from. For fun, I
added
nodelist {
node {
alpha: 192.168.80.16
}
node {
beta: 192.168.80.17
}
}
}
to corosync.conf, and restarted both nodes. But that didn't help.
More information about the Users
mailing list