[ClusterLabs] Regression in Filesystem RA

Thu Oct 12 02:30:30 EDT 2017

Hello,

2nd post in 10 years, lets see if this one gets an answer unlike the first
one...

One of the main use cases for pacemaker here are DRBD replicated
active/active mailbox servers (dovecot/exim) on Debian machines. 
We've been doing this for a loong time, as evidenced by the oldest pair
still running Wheezy with heartbeat and pacemaker 1.1.7.

The majority of cluster pairs is on Jessie with corosync and backported
pacemaker 1.1.16.

Yesterday we had a hiccup, resulting in half the machines loosing
their upstream router for 50 seconds which in turn caused the pingd RA to
trigger a fail-over of the DRBD RA and associated resource group
(filesystem/IP) to the other node. 

The old cluster performed flawlessly, the newer clusters all wound up with
DRBD and FS resource being BLOCKED as the processes holding open the
filesystem didn't get killed fast enough.

Comparing the 2 RAs (no versioning T_T) reveals a large change in the
"signal_processes" routine.

So with the old Filesystem RA using fuser we get something like this and
thousands of processes killed per second:
---
Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: (res_Filesystem_mb07:stop:stdout)   3478  3593  3597  3618  3654  3705  3708  3716  3736  3781  3792  3804  3963  3964  3972  3974  3978  3980  3981  3982  3985  3987  3991  3996  4002  4008  4013  4030
Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: (res_Filesystem_mb07:stop:stderr) cmccmccmccmcmcmcmcmccmccmcmcmcmcmcmcmcmcmcmcmcmccmcm
Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: (res_Filesystem_mb07:stop:stdout)   4032  4058  4086  4107  4199  4230  4320  4336  4362  4420  4429  4432  4435  4450  4468  4470  4471  4498  4510  4519  4584  4592  4604  4607  4632  4638  4640  4649  4676  4722  4765
---

Whereas the new RA (newer isn't better) that goes around killing processes
individually with beautiful logging was a total fail at about 4 processes
per second killed...
---
Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: sending signal TERM to: mail        4226    4909  0 09:43 ?        S      0:00 dovecot/imap 
Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: sending signal TERM to: mail        4229    4909  0 09:43 ?        S      0:00 dovecot/imap [idling]
Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: sending signal TERM to: mail        4238    4909  0 09:43 ?        S      0:00 dovecot/imap 
Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: sending signal TERM to: mail        4239    4909  0 09:43 ?        S      0:00 dovecot/imap 
---

So my questions are:

1. Am I the only one with more than a handful of processes per FS who
can't afford to wait hours the new routine to finish?
2. Can we have the old FUSER (kill) mode back?

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Rakuten Communications