[ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

Jason Gauthier jagauthier at gmail.com
Fri Jun 29 20:37:46 UTC 2018


On Thu, Jun 21, 2018 at 10:47 AM Jason Gauthier <jagauthier at gmail.com> wrote:
>
> On Thu, Jun 21, 2018 at 9:49 AM Jan Pokorný <jpokorny at redhat.com> wrote:
> >
> > On 21/06/18 07:05 -0400, Jason Gauthier wrote:
> > > On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield <ccaulfie at redhat.com> wrote:
> > >> On 19/06/18 18:47, Jason Gauthier wrote:
> > >>> Attached!
> > >>
> > >> That's very odd. I can see communication with the server and corosync in
> > >> there (do it's doing something) but no logging at all. When I start
> > >> qdevice on my systems it logs loads of messages even if it doesn't
> > >> manage to contact the server. Do you have any logging entries in
> > >> corosync.conf that might be stopping it?
> > >
> > > I haven't checked the corosync logs for any entries before, but I just
> > > did.  There isn't anything logged.
> >
> > What about syslog entries (may boil down to /var/log/messages,
> > journald log, or whatever sink is configured)?
>
> I took a look, since both you and Chrissie mentioned that.
>
> There aren't any new entries added to any of the /var/log files.
>
> # corosync-qdevice -f -d
> # date
> Thu Jun 21 10:36:06 EDT 2018
>
> # ls -lt|head
> total 152072
> -rw-r----- 1 root        adm          68018 Jun 21 10:34 auth.log
> -rw-rw-r-- 1 root        utmp      18704352 Jun 21 10:34 lastlog
> -rw-rw-r-- 1 root        utmp        107136 Jun 21 10:34 wtmp
> -rw-r----- 1 root        adm         248444 Jun 21 10:34 daemon.log
> -rw-r----- 1 root        adm         160899 Jun 21 10:34 syslog
> -rw-r----- 1 root        adm        1119856 Jun 21 09:46 kern.log
>
> I did look through daemon, messages, and syslog just to be sure.
>
> > >> Where did the binary come from? did you build it yourself or is it from
> > >> a package? I wonder if it's got corrupted or is a bad version. Possibly
> > >> linked against a 'dodgy' libqb - there have been some things going on
> > >> there that could cause logging to go missing in some circumstances.
> > >>
> > >> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit
> > >> here anyway!
> > >
> > > Hmm. Interesting.  I installed the debian package.  When it didn't
> > > work, I grabbed the source from github.  They both act the same way,
> > > but if there is an underlying library issue then that will continue to
> > > be a problem.
> > >
> > > It doesn't say much:
> > > /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1
> >
> > You are likely using libqb v1.0.1.
>
> Correct. I didn't even think to look at the output of dpkg -l for the
> package version.
> Debian 9 also packages binutils-2.28
>
> > Ability to figure out the proper package version is one of the most
> > basic skills to provide useful diagnostics about the issues with
> > distro-provided packages.
> >
> > With Debian, the proper incantation seems to be
> >
> >   dpkg -s libqb-dev | grep -i version
> >
> > or
> >
> >   apt list libqb-dev
> >
> > (or substitute libqb0 for libqb-dev).
> >
> > As Chrissie mentioned, there is some fishiness possible if you happen
> > to use ld linker from binutils 2.29+ for the building with this old
> > libqb in the mix, so if the issues persist and logging seems to be
> > missing, try recompiling with the downgraded binutils package below
> > said breakage point.
>
> Since the system already has a lower numbered binutils (2.28) I wonder
> if I should attempt to build a newer version of the libqb library.
>
> As Chrissie mentioned, I will open a bug with Debian in the Interim.
> But I don 't believe I will see resolution to that any time soon. :)

I was finally able to look at this problem again, and found that qnetd
is giving me some messaging, but I don't know what to do with it.

Jun 29 16:34:35 debug   New client connected
Jun 29 16:34:35 debug     cluster name = zeta
Jun 29 16:34:35 debug     tls started = 1
Jun 29 16:34:35 debug     tls peer certificate verified = 1
Jun 29 16:34:35 debug     node_id = 1084772368
Jun 29 16:34:35 debug     pointer = 0x563afd609d70
Jun 29 16:34:35 debug     addr_str = ::ffff:192.168.80.16:38010
Jun 29 16:34:35 debug     ring id = (40a85010.89ec)
Jun 29 16:34:35 debug     cluster dump:
Jun 29 16:34:35 debug       client = ::ffff:192.168.80.16:38010,
node_id = 1084772368
Jun 29 16:34:35 debug   Client ::ffff:192.168.80.16:38010 (cluster
zeta, node_id 1084772368) sent initial node list.
Jun 29 16:34:35 debug     msg seq num 4
Jun 29 16:34:35 debug     node list:
Jun 29 16:34:35 error   ffsplit: Received empty config node list for
client ::ffff:192.168.80.16:38010
Jun 29 16:34:35 error   Algorithm returned error code. Sending error reply.
Jun 29 16:34:35 debug   Client ::ffff:192.168.80.16:38010 (cluster
zeta, node_id 1084772368) sent membership node list.
Jun 29 16:34:35 debug     msg seq num 5
Jun 29 16:34:35 debug     ring id = (40a85010.89ec)
Jun 29 16:34:35 debug     node list:
Jun 29 16:34:35 debug       node_id = 1084772368, data_center_id = 0,
node_state = not set
Jun 29 16:34:35 debug       node_id = 1084772369, data_center_id = 0,
node_state = not set
Jun 29 16:34:35 debug   Algorithm result vote is Ask later
Jun 29 16:34:35 debug   Client ::ffff:192.168.80.16:38010 (cluster
zeta, node_id 1084772368) sent quorum node list.
Jun 29 16:34:35 debug     msg seq num 6
Jun 29 16:34:35 debug     quorate = 1
Jun 29 16:34:35 debug     node list:
Jun 29 16:34:35 debug       node_id = 1084772368, data_center_id = 0,
node_state = member
Jun 29 16:34:35 debug       node_id = 1084772369, data_center_id = 0,
node_state = member

It looks like "config node list" is empty, but the other lists are
not.  I'm not sure where it's getting that node list from.  For fun, I
added
nodelist {
    node {
       alpha: 192.168.80.16
     }
    node {
       beta: 192.168.80.17
    }
  }
}
to corosync.conf, and restarted both nodes. But that didn't help.


More information about the Users mailing list