[ClusterLabs] the kernel information when pacemaker restarts the PAF resource because of monitor timeout

Tue May 29 23:08:54 EDT 2018

Hi,

The cluster uses the PAF to manage the postgres db, and it use the GFS2 to manage the shared storage. The configuration is as attachment.

When we are doing the performance test, the CPU is very high. We set the op monitor timeout 100 seconds. PAF call pg_isready to monitor the db. When the call load becoming higher, the pg_isready response time increase. When it has no response after 100 seconds, the pacemaker restarts the PAF resource. Then there is many kernel log and then the PAF resource start fails.

So my question is:
1. When the monitor operation is timeout, there is many kernel log printed in /var/log/messages, could you please help check if this log shows the cluster has anything wrong? It seems like the share disk storage error prevents the database to start.  
2.. When the cluster runs as product, it could not avoid the call load become high for some time and the monitor will become timeout. Then the PAF resource will be restarted. Is there any way to avoid the resource to restart when the system is busy?

Thanks
Steven

-------------- next part --------------
A non-text attachment was scrubbed...
Name: config
Type: application/octet-stream
Size: 4669 bytes
Desc: config
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180530/902c5c8b/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: messages
Type: application/octet-stream
Size: 184187 bytes
Desc: messages
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180530/902c5c8b/attachment-0003.obj>