[ClusterLabs] pending actions
Ken Gaillot
kgaillot at redhat.com
Fri Mar 24 18:33:21 EDT 2017
On 03/07/2017 04:13 PM, Jehan-Guillaume de Rorthais wrote:
> Hi,
>
> Occasionally, I find my cluster with one pending action not being executed for
> some minutes (I guess until the "PEngine Recheck Timer" elapse).
>
> Running "crm_simulate -SL" shows the pending actions.
>
> I'm still confused about how it can happens, why it happens and how to avoid
> this.
It's most likely a bug in the crmd, which schedules PE runs.
> Earlier today, I started my test cluster with 3 nodes and a master/slave
> resource[1], all with positive master score (1001, 1000 and 990), and the
> cluster kept the promote action as a pending action for 15 minutes.
>
> You will find in attachment the first 3 pengine inputs executed after the
> cluster startup.
>
> What are the consequences if I set cluster-recheck-interval to 30s as instance?
The cluster would consume more CPU and I/O continually recalculating the
cluster state.
It would be nice to have some guidelines for cluster-recheck-interval
based on real-world usage, but it's just going by gut feeling at this
point. The cluster automatically recalculates when something
"interesting" happens -- a node comes or goes, a monitor fails, a node
attribute changes, etc. The cluster-recheck-interval is (1) a failsafe
for buggy situations like this, and (2) the maximum granularity of many
time-based checks such as rules. I would personally use at least 5
minutes, though less is probably reasonable, especially with simple
configurations (number of nodes/resources/constraints).
> Thanks in advance for your lights :)
>
> Regards,
>
> [1] here is the setup:
> http://dalibo.github.io/PAF/Quick_Start-CentOS-7.html#cluster-resource-creation-and-management
Feel free to open a bug report and include some logs around the time of
the incident (most importantly from the DC).
More information about the Users
mailing list