[ClusterLabs] Regression in Filesystem RA

Mon Oct 16 14:09:21 EDT 2017

Hi,

On Thu, Oct 12, 2017 at 03:30:30PM +0900, Christian Balzer wrote:
> 
> Hello,
> 
> 2nd post in 10 years, lets see if this one gets an answer unlike the first
> one...
> 
> One of the main use cases for pacemaker here are DRBD replicated
> active/active mailbox servers (dovecot/exim) on Debian machines. 
> We've been doing this for a loong time, as evidenced by the oldest pair
> still running Wheezy with heartbeat and pacemaker 1.1.7.
> 
> The majority of cluster pairs is on Jessie with corosync and backported
> pacemaker 1.1.16.
> 
> Yesterday we had a hiccup, resulting in half the machines loosing
> their upstream router for 50 seconds which in turn caused the pingd RA to
> trigger a fail-over of the DRBD RA and associated resource group
> (filesystem/IP) to the other node. 
> 
> The old cluster performed flawlessly, the newer clusters all wound up with
> DRBD and FS resource being BLOCKED as the processes holding open the
> filesystem didn't get killed fast enough.
> 
> Comparing the 2 RAs (no versioning T_T) reveals a large change in the
> "signal_processes" routine.
> 
> So with the old Filesystem RA using fuser we get something like this and
> thousands of processes killed per second:
> ---
> Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: (res_Filesystem_mb07:stop:stdout)   3478  3593  3597  3618  3654  3705  3708  3716  3736  3781  3792  3804  3963  3964  3972  3974  3978  3980  3981  3982  3985  3987  3991  3996  4002  4008  4013  4030
> Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: (res_Filesystem_mb07:stop:stderr) cmccmccmccmcmcmcmcmccmccmcmcmcmcmcmcmcmcmcmcmcmccmcm
> Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: (res_Filesystem_mb07:stop:stdout)   4032  4058  4086  4107  4199  4230  4320  4336  4362  4420  4429  4432  4435  4450  4468  4470  4471  4498  4510  4519  4584  4592  4604  4607  4632  4638  4640  4649  4676  4722  4765
> ---
> 
> Whereas the new RA (newer isn't better) that goes around killing processes
> individually with beautiful logging was a total fail at about 4 processes
> per second killed...
> ---
> Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: sending signal TERM to: mail        4226    4909  0 09:43 ?        S      0:00 dovecot/imap 
> Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: sending signal TERM to: mail        4229    4909  0 09:43 ?        S      0:00 dovecot/imap [idling]
> Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: sending signal TERM to: mail        4238    4909  0 09:43 ?        S      0:00 dovecot/imap 
> Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: sending signal TERM to: mail        4239    4909  0 09:43 ?        S      0:00 dovecot/imap 
> ---
> 
> So my questions are:
> 
> 1. Am I the only one with more than a handful of processes per FS who
> can't afford to wait hours the new routine to finish?

The change was introduced about five years ago.

> 2. Can we have the old FUSER (kill) mode back?

Yes. I'll make a pull request.

Sorry for the trouble.

Thanks,

Dejan

> Regards,
> 
> Christian
> -- 
> Christian Balzer        Network/Systems Engineer                
> chibi at gol.com   	Rakuten Communications
> 
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org