[ClusterLabs] [Problem] The crmd fails to connect with pengine.
jpokorny at redhat.com
Wed Jan 2 14:43:15 UTC 2019
On 28/12/18 05:51 +0900, renayama19661014 at ybb.ne.jp wrote:
> This problem occurred with our users.
> The following problem occurred in a two-node cluster that does not set STONITH.
> The problem seems to have occurred in the following procedure.
> Step 1) Configure the cluster with 2 nodes. The DC node is the second node.
> Step 2) Several resources are running on the first node.
> Step 3) It stops almost at the same time in order of 2nd node and 1st node.
Do I decipher the above correctly that the cluster is scheduled for
shutdown (fully independently node by node or through a single trigger
with a high level management tool?) and starts proceeding in serial
manner, shutting 2nd node ~ original DC first?
> Step 4) After the second node stops, the first node tries to
> calculate the state transition for the resource stop.
> However, crmd fails to connect with pengine and does not calculate state transitions.
> Dec 27 08:36:00 rh74-01 crmd: warning: Setup of client connection failed, not adding channel to mainloop
Sadly, it looks like details of why this happened would only be
retained when debugging/tracing verbosity of the log messages
was enabled, which likely wasn't the case.
Anyway, perhaps providing a wider context of the log messages
from this first node might shed some light into this.
> As a result, Pacemaker will stop without stopping the resource.
This might have serious consequences in some scenarios, perhaps
unless some watchdog-based solution (SBD?) was used as a fencing
of choice since it would not get defused just as the resource
wasn't stopped, I think...
> The problem seems to have occurred in the following environment.
> - libqb 1.0
> - corosync 2.4.1
> - Pacemaker 1.1.15
> I tried to reproduce this problem, but for now it can not be reproduced.
> Do you know the cause of this problem?
No idea at this point.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 819 bytes
Desc: not available
More information about the Users