[ClusterLabs] the kernel information when pacemaker restarts the PAF resource because of monitor timeout

Wed May 30 06:51:21 UTC 2018

Hi,

That's related to a thing I'm fighting for.

An option to skip X lost monitoring attempts is planned, but not 
implemented yet, as far as I know.

Regards,

Klecho

On 30/05/18 06:08, 范国腾 wrote:
> Hi,
>
> The cluster uses the PAF to manage the postgres db, and it use the GFS2 to manage the shared storage. The configuration is as attachment.
>
> When we are doing the performance test, the CPU is very high. We set the op monitor timeout 100 seconds. PAF call pg_isready to monitor the db. When the call load becoming higher, the pg_isready response time increase. When it has no response after 100 seconds, the pacemaker restarts the PAF resource. Then there is many kernel log and then the PAF resource start fails.
>
> So my question is:
> 1. When the monitor operation is timeout, there is many kernel log printed in /var/log/messages, could you please help check if this log shows the cluster has anything wrong? It seems like the share disk storage error prevents the database to start.
> 2.. When the cluster runs as product, it could not avoid the call load become high for some time and the monitor will become timeout. Then the PAF resource will be restarted. Is there any way to avoid the resource to restart when the system is busy?
>
> Thanks
> Steven
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Klecho

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180530/0542cc51/attachment.html>