[ClusterLabs] ask for help for a pacemaker problem
Ken Gaillot
kgaillot at redhat.com
Thu Jul 26 12:37:55 EDT 2018
On Wed, 2018-07-25 at 23:43 +0800, 李培 wrote:
> Dear all
>
> I have a problem when I use pacemaker.
>
> the corosync.log in two nodes grows to 1Gb in about one hour.
>
> the corosync.log only has one kind of message in one node named paas-
> controller-22-0-2-12 as below:
> Jul 23 14:00:06 [15036] paas-controller-22-0-2-12 cib: error:
> cib_process_shutdown_req:
> Shutdown ACK from 22.0.2.11 - not shutting down
> Jul 23 14:00:06 [15036] paas-controller-22-0-2-12 cib: error:
> cib_process_shutdown_req:
> Shutdown ACK from 22.0.2.11 - not shutting down
> Jul 23 14:00:06 [15036] paas-controller-22-0-2-12 cib: error:
> cib_process_shutdown_req:
> Shutdown ACK from 22.0.2.11 - not shutting down
> Jul 23 14:00:06 [15036] paas-controller-22-0-2-12 cib: error:
> cib_process_shutdown_req:
> Shutdown ACK from 22.0.2.11 - not shutting
>
>
> the corosync.log only has one kind of message in another node named
> paas-controller-22-0-2-11 as below:
> Jul 23 14:00:06 [15036] paas-controller-22-0-2-11 cib: error:
> cib_process_shutdown_req:
> Shutdown ACK from 22.0.2.12 - not shutting down
> Jul 23 14:00:06 [15036] paas-controller-22-0-2-11 cib: error:
> cib_process_shutdown_req:
> Shutdown ACK from 22.0.2.12 - not shutting down
> Jul 23 14:00:06 [15036] paas-controller-22-0-2-11 cib: error:
> cib_process_shutdown_req:
> Shutdown ACK from 22.0.2.12 - not shutting down
> Jul 23 14:00:06 [15036] paas-controller-22-0-2-11 cib: error:
> cib_process_shutdown_req:
> Shutdown ACK from 22.0.2.12 - not shutting
>
> it seems that the two nodes do not response shutdown request to each
> other,so the message keeps being sent out.
>
> have any of you ever encountered this issue?
>
> how it happened? how it can be solved?
>
> I am looking forwarding to hearing from you.
>
> Thanks in advance.
>
> Sincerely yours
This is interesting. At least one of the nodes should have an info-
level log message like "Shutdown REQ from ..." before these messages
start.
For this to happen, one of the nodes has to receive a shutdown request
from the other, then acknowledge it with a reply, and then the node
that sent the request somehow doesn't know it sent a request, and so
logs this message.
The funny (?) part is that it will reply to the acknowledgement, and
then that node will (wrongly) treat that as a reply to one of its own
shutdown requests, which it doesn't have, so it logs this message and
replies back. Infinite loop :-/
I've opened a bug for the loop:
https://bugs.clusterlabs.org/show_bug.cgi?id=5361
However an unanswered question is how the loop got started. One of the
nodes thought it received a shutdown request, but the other node didn't
think it sent one. That is a mystery here. If you can find the
"Shutdown REQ" message, the logs from both nodes around that time might
shed some light.
--
Ken Gaillot <kgaillot at redhat.com>
More information about the Users
mailing list