[ClusterLabs] error: The cib process (17858) exited: Key has expired (127)
Ken Gaillot
kgaillot at redhat.com
Fri Mar 24 12:15:40 EDT 2017
On 03/24/2017 11:06 AM, Rens Houben wrote:
> I activated debug=cib, and retried.
>
> New log file up at
> http://proteus.systemec.nl/~shadur/pacemaker/pacemaker_2.log.txt
> <http://proteus.systemec.nl/%7Eshadur/pacemaker/pacemaker_2.log.txt> ;
> unfortunately, while that *is* more information I'm not seeing anything
> that looks like it could be the cause, although it shouldn't be reading
> any config files yet because there shouldn't be any *to* read...
If there's no config file, pacemaker will create an empty one and use
that, so it still goes through the mechanics of validating it and
writing it out.
Debug doesn't give us much -- just one additional message before it dies:
Mar 24 16:59:27 [20266] castor cib: debug: activateCibXml:
Triggering CIB write for start op
You might want to look at the system log around that time to see if
something else is going wrong. If you have SELinux enabled, check the
audit log for denials.
> As to the misleading error message, it gets weirder: I grabbed a copy of
> the source code via apt-get source, and the phrase 'key has expired'
> does not occur anywhere in any file according to find ./ -type f -exec
> grep -il 'key has expired' {} \; so I have absolutely NO idea where it's
> coming from...
Right, it's not part of pacemaker, it's just the standard system error
message for errno 127. But the exit status isn't an errno, so that's not
the right interpretation. I can't find any code path in the cib that
would return 127, so I don't know what the right intepretation would be.
>
> --
> Rens Houben
> Systemec Internet Services
>
> SYSTEMEC BV
>
> Marinus Dammeweg 25, 5928 PW Venlo
> Postbus 3290, 5902 RG Venlo
> Industrienummer: 6817
> Nederland
>
> T: 077-3967572 (Support)
> K.V.K. nummer: 12027782 (Venlo)
>
> Systemec Datacenter Venlo & Nettetal <https://www.systemec.nl>
>
> Systemec Helpdesk <https://support.systemec.nl> Helpdesk
> <https://support.systemec.nl>
>
> Aanmelden nieuwsbrief <https://www.systemec.nl/nl/nieuwsbrief>
> Aanmelden nieuwsbrief <https://www.systemec.nl/nl/nieuwsbrief>
>
> Volg ons op: Systemec Twitter <https://twitter.com/systemec> Systemec
> Facebook <https://www.facebook.com/systemecbv> Systemec Linkedin
> <http://www.linkedin.com/company/systemec-b.v.> Systemec Youtube
> <http://www.youtube.com/user/systemec1>
>
>
> ________________________________________
> Van: Ken Gaillot <kgaillot at redhat.com>
> Verzonden: vrijdag 24 maart 2017 16:49
> Aan: users at clusterlabs.org
> Onderwerp: Re: [ClusterLabs] error: The cib process (17858) exited: Key
> has expired (127)
>
> On 03/24/2017 08:06 AM, Rens Houben wrote:
>> I recently upgraded a two-node cluster (named 'castor' and 'pollux'
>> because I should not be allowed to think up computer names before I've
>> had my morning caffeine) from Debian wheezy to Jessie after the
>> backports for corosync and pacemaker finally made it in. However, one of
>> the two servers failed to start correctly for no really obvious reason.
>>
>> Given as how it'd been years since I last set them up and had forgotten
>> pretty much everything about it in the interim I decided to purge
>> corosync and pacemaker on both systems and run with clean installs instead.
>>
>> This worked on pollux, but not on castor. Even after going pack,
>> re-purging, removing everything legacy in /var/lib/heartbeat and
>> emptying both directories, castor still refuses to bring up pacemaker.
>>
>>
>> I put the full log of a start attempt up at
>> http://proteus.systemec.nl/~shadur/pacemaker/pacemaker.log.txt
> <http://proteus.systemec.nl/%7Eshadur/pacemaker/pacemaker.log.txt>
>> <http://proteus.systemec.nl/%7Eshadur/pacemaker/pacemaker.log.txt>, but
>> this is the excerpt that I /think/ is causing the failure:
>>
>> Mar 24 13:59:05 [25495] castor pacemakerd: error: pcmk_child_exit:The
>> cib process (25502) exited: Key has expired (127)
>> Mar 24 13:59:05 [25495] castor pacemakerd: notice:
>> pcmk_process_exit:Respawning failed child process: cib
>>
>> I don't see any entries from cib in the log that suggest anything's
>> going wrong, though, and I'm running out of ideas on where to look next.
>
> The "Key has expired" message is misleading. (Pacemaker really needs an
> overhaul of the exit codes it can return, so these messages can be
> reliable, but there are always more important things to take care of ...)
>
> Pacemaker is getting 127 as the exit status of cib, and interpreting
> that as a standard system error number, but it probably isn't one. I
> don't actually see any way that the cib can return 127, so I'm not sure
> what that might indicate.
>
> In any case, the cib is mysteriously dying whenever it tries to start,
> apparently without logging why or dumping core. (Do you have cores
> disabled at the OS level?)
>
>> Does anyone have any suggestions as to how to coax more information out
>> of the processes and into the log files so I'll have a clue to work with?
>
> Try it again with PCMK_debug=cib in /etc/default/pacemaker. That should
> give more log messages.
>
>>
>> Regards,
>>
>> --
>> Rens Houben
>> Systemec Internet Services
>>
>> SYSTEMEC BV
>>
>> Marinus Dammeweg 25, 5928 PW Venlo
>> Postbus 3290, 5902 RG Venlo
>> Industrienummer: 6817
>> Nederland
>>
>> T: 077-3967572 (Support)
>> K.V.K. nummer: 12027782 (Venlo)
>>
>> Systemec Datacenter Venlo & Nettetal <https://www.systemec.nl>
>>
>> Systemec Helpdesk <https://support.systemec.nl> Helpdesk
>> <https://support.systemec.nl>
>>
>> Aanmelden nieuwsbrief <https://www.systemec.nl/nl/nieuwsbrief>
>> Aanmelden nieuwsbrief <https://www.systemec.nl/nl/nieuwsbrief>
>>
>> Volg ons op: Systemec Twitter <https://twitter.com/systemec> Systemec
>> Facebook <https://www.facebook.com/systemecbv> Systemec Linkedin
>> <http://www.linkedin.com/company/systemec-b.v.> Systemec Youtube
>> <http://www.youtube.com/user/systemec1>
More information about the Users
mailing list