[ClusterLabs] corosync-qdevice doesn't daemonize (or stay running)

Jan Friesse jfriesse at redhat.com
Fri Jul 13 09:55:57 EDT 2018


Jason,

> On Thu, Jun 21, 2018 at 10:47 AM Jason Gauthier <jagauthier at gmail.com> wrote:
>>
>> On Thu, Jun 21, 2018 at 9:49 AM Jan Pokorný <jpokorny at redhat.com> wrote:
>>>
>>> On 21/06/18 07:05 -0400, Jason Gauthier wrote:
>>>> On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield <ccaulfie at redhat.com> wrote:
>>>>> On 19/06/18 18:47, Jason Gauthier wrote:
>>>>>> Attached!
>>>>>
>>>>> That's very odd. I can see communication with the server and corosync in
>>>>> there (do it's doing something) but no logging at all. When I start
>>>>> qdevice on my systems it logs loads of messages even if it doesn't
>>>>> manage to contact the server. Do you have any logging entries in
>>>>> corosync.conf that might be stopping it?
>>>>
>>>> I haven't checked the corosync logs for any entries before, but I just
>>>> did.  There isn't anything logged.
>>>
>>> What about syslog entries (may boil down to /var/log/messages,
>>> journald log, or whatever sink is configured)?
>>
>> I took a look, since both you and Chrissie mentioned that.
>>
>> There aren't any new entries added to any of the /var/log files.
>>
>> # corosync-qdevice -f -d
>> # date
>> Thu Jun 21 10:36:06 EDT 2018
>>
>> # ls -lt|head
>> total 152072
>> -rw-r----- 1 root        adm          68018 Jun 21 10:34 auth.log
>> -rw-rw-r-- 1 root        utmp      18704352 Jun 21 10:34 lastlog
>> -rw-rw-r-- 1 root        utmp        107136 Jun 21 10:34 wtmp
>> -rw-r----- 1 root        adm         248444 Jun 21 10:34 daemon.log
>> -rw-r----- 1 root        adm         160899 Jun 21 10:34 syslog
>> -rw-r----- 1 root        adm        1119856 Jun 21 09:46 kern.log
>>
>> I did look through daemon, messages, and syslog just to be sure.
>>
>>>>> Where did the binary come from? did you build it yourself or is it from
>>>>> a package? I wonder if it's got corrupted or is a bad version. Possibly
>>>>> linked against a 'dodgy' libqb - there have been some things going on
>>>>> there that could cause logging to go missing in some circumstances.
>>>>>
>>>>> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit
>>>>> here anyway!

Corosync-qdevice is using same config as corosync, so to get messages on 
stderr, please configure

logging.to_stderr: on


>>>>
>>>> Hmm. Interesting.  I installed the debian package.  When it didn't
>>>> work, I grabbed the source from github.  They both act the same way,
>>>> but if there is an underlying library issue then that will continue to
>>>> be a problem.
>>>>
>>>> It doesn't say much:
>>>> /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1
>>>
>>> You are likely using libqb v1.0.1.
>>
>> Correct. I didn't even think to look at the output of dpkg -l for the
>> package version.
>> Debian 9 also packages binutils-2.28
>>
>>> Ability to figure out the proper package version is one of the most
>>> basic skills to provide useful diagnostics about the issues with
>>> distro-provided packages.
>>>
>>> With Debian, the proper incantation seems to be
>>>
>>>    dpkg -s libqb-dev | grep -i version
>>>
>>> or
>>>
>>>    apt list libqb-dev
>>>
>>> (or substitute libqb0 for libqb-dev).
>>>
>>> As Chrissie mentioned, there is some fishiness possible if you happen
>>> to use ld linker from binutils 2.29+ for the building with this old
>>> libqb in the mix, so if the issues persist and logging seems to be
>>> missing, try recompiling with the downgraded binutils package below
>>> said breakage point.
>>
>> Since the system already has a lower numbered binutils (2.28) I wonder
>> if I should attempt to build a newer version of the libqb library.
>>
>> As Chrissie mentioned, I will open a bug with Debian in the Interim.
>> But I don 't believe I will see resolution to that any time soon. :)
> 
> I was finally able to look at this problem again, and found that qnetd
> is giving me some messaging, but I don't know what to do with it.
> 
> Jun 29 16:34:35 debug   New client connected
> Jun 29 16:34:35 debug     cluster name = zeta
> Jun 29 16:34:35 debug     tls started = 1
> Jun 29 16:34:35 debug     tls peer certificate verified = 1
> Jun 29 16:34:35 debug     node_id = 1084772368
> Jun 29 16:34:35 debug     pointer = 0x563afd609d70
> Jun 29 16:34:35 debug     addr_str = ::ffff:192.168.80.16:38010
> Jun 29 16:34:35 debug     ring id = (40a85010.89ec)
> Jun 29 16:34:35 debug     cluster dump:
> Jun 29 16:34:35 debug       client = ::ffff:192.168.80.16:38010,
> node_id = 1084772368
> Jun 29 16:34:35 debug   Client ::ffff:192.168.80.16:38010 (cluster
> zeta, node_id 1084772368) sent initial node list.
> Jun 29 16:34:35 debug     msg seq num 4
> Jun 29 16:34:35 debug     node list:
> Jun 29 16:34:35 error   ffsplit: Received empty config node list for
> client ::ffff:192.168.80.16:38010

Yes, this is interesting. Could you please share your config?

> Jun 29 16:34:35 error   Algorithm returned error code. Sending error reply.
> Jun 29 16:34:35 debug   Client ::ffff:192.168.80.16:38010 (cluster
> zeta, node_id 1084772368) sent membership node list.
> Jun 29 16:34:35 debug     msg seq num 5
> Jun 29 16:34:35 debug     ring id = (40a85010.89ec)
> Jun 29 16:34:35 debug     node list:
> Jun 29 16:34:35 debug       node_id = 1084772368, data_center_id = 0,
> node_state = not set
> Jun 29 16:34:35 debug       node_id = 1084772369, data_center_id = 0,
> node_state = not set
> Jun 29 16:34:35 debug   Algorithm result vote is Ask later
> Jun 29 16:34:35 debug   Client ::ffff:192.168.80.16:38010 (cluster
> zeta, node_id 1084772368) sent quorum node list.
> Jun 29 16:34:35 debug     msg seq num 6
> Jun 29 16:34:35 debug     quorate = 1
> Jun 29 16:34:35 debug     node list:
> Jun 29 16:34:35 debug       node_id = 1084772368, data_center_id = 0,
> node_state = member
> Jun 29 16:34:35 debug       node_id = 1084772369, data_center_id = 0,
> node_state = member
> 
> It looks like "config node list" is empty, but the other lists are
> not.  I'm not sure where it's getting that node list from.  For fun, I
> added
> nodelist {
>      node {
>         alpha: 192.168.80.16
>       }
>      node {
>         beta: 192.168.80.17
>      }
>    }
> }

This is how nodelist doesn't look like. It should look like:
nodelist {
         node {
                 ring0_addr: 192.168.80.16
                 nodeid: 1
         }
         node {
                 ring0_addr: 192.168.80.17
                 nodeid: 2
         }
}

But it's really weird corosync-qdevice started without proper nodelist 
(it shouldn't).

Honza

> to corosync.conf, and restarted both nodes. But that didn't help.
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 



More information about the Users mailing list