[ClusterLabs] Pacemaker occasionally takes minutes to respond

Tue May 9 16:34:26 EDT 2017

Actually I found some more details:

there are two resources: A and B

resource B depends on resource A (when the RA monitors B, if will fail if A is not running properly)

If I stop resource A, the next monitor operation of "B" will fail. Interestingly, this check happens immediately after A is stopped.

B is configured to restart if monitor fails. Start timeout is rather long, 180 seconds. So pacemaker tries to restart B, and waits.

If I want to start "A", nothing happens until the start operation of "B" fails - typically several minutes.

Is this the right behavior?
It appears that pacemaker is blocked until resource B is being started, and I cannot really start its dependency...
Shouldn't it be possible to start a resource while another resource is also starting?

Thanks,
Attila

From: Attila Megyeri [mailto:amegyeri at minerva-soft.com]
Sent: Tuesday, May 9, 2017 9:53 PM
To: users at clusterlabs.org; kgaillot at redhat.com
Subject: [ClusterLabs] Pacemaker occasionally takes minutes to respond

Hi Ken, all,

We ran into an issue very similar to the one described in https://bugzilla.redhat.com/show_bug.cgi?id=1430112 /  [Intel 7.4 Bug] Pacemaker occasionally takes minutes to respond

But  in our case we are not using fencing/stonith at all.

Many times when I want to start/stop/cleanup a resource, it takes tens of seconds (or even minutes) till the command gets executed. The logs show nothing in that period, the redundant rings show no fault.

Could this be the same issue?

Any hints on how to troubleshoot this?
It is  pacemaker 1.1.10, corosync 2.3.3

Cheers,
Attila

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20170509/36cd418c/attachment-0003.html>