[ClusterLabs] Very long timeout shutting down a server with systemd resource

Tue Jan 24 04:40:45 EST 2023

On 1/23/23 19:05, Reid Wahl wrote:
> On Mon, Jan 23, 2023 at 9:59 AM Roberto Ferrari <rferrari at mbigroup.it> wrote:
>>
>> On 23/01/23 18:25, Reid Wahl wrote:
>>> On Mon, Jan 23, 2023 at 7:51 AM Roberto Ferrari <rferrari at mbigroup.it> wrote:
>>>>
>>>> Hello everybody,
>>>> I'd like to understand a strange behavior of a cluster of mine with,
>>>> basically, some IPAddr resource and a systemd resource that deals with
>>>> netfilter-persistent.
>>>> Here the configuration:
>>>>
>>>> primitive FW-VIP-Outside IPaddr2 \
>>>>            params ip=192.168.26.74 cidr_netmask=24 nic=outside arp_bg=true \
>>>>            op monitor interval=20s timeout=20s
>>>> primitive FW-VIP-Private IPaddr2 \
>>>>            params ip=192.168.104.100 cidr_netmask=24 nic=private arp_bg=true \
>>>>            op monitor interval=20s timeout=20s
>>>> primitive Netfilter systemd:netfilter-persistent \
>>>>            op start interval=0 timeout=60 \
>>>>            op stop interval=0 timeout=60
>>>> group FW-VIPs FW-VIP-Private FW-VIP-Outside Netfilter
>>>> The active node, when I reboot the server, hangs shutting down for many
>>>> minutes writing:
>>>>
>>>> A stop job is running for Pacemaker High Availability Cluster Manager (
>>>> 11 s / 30 min). (where 11 is the number of seconds already passed)
>>>>
>>>> Obviously switching to another master is immediate and performing
>>>> syetmctl stop netfilter-persistent is immediate too.
>>>>
>>>> Do you have any hint on what goes wrong with this? I cannot find
>>>> anything strange in the logs.
>>>>
>>>> Thanks a lot,
>>>>
>>>> Roberto.
>>>
>>> Is the netfilter systemd unit enabled outside pacemaker? Run
>>> `systemctl is-enabled netfilter-persistent` to find out, and run
>>> `systemctl disable netfilter-persistent` to disable it if it's
>>> enabled. Only Pacemaker should start or stop netfilter.
>>>
>>>>
>>>> --
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Reid Wahl (He/Him)
>>> Senior Software Engineer, Red Hat
>>> RHEL High Availability - Pacemaker
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>
>> Thank's a lot Reid,
>> Unfortunately it wasn't my case, netfilter-persistent seemed to be
>> disabled at boot.
>> Cheers,
>>
>> R.
> 
> Can you share the pacemaker logs from the shutdown period? That will
> probably give some idea of what it's waiting on.
> 
>> --
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> 
> 

Here you are:

Jan 23 09:21:33 usab-fe2 pacemaker-controld[1327]:  notice: Result of 
start operation for Netfilter on usab-fe2: 0 (ok)
Jan 23 09:27:45 usab-fe2 pacemakerd[1296]:  notice: Caught 'Terminated' 
signal
Jan 23 09:27:45 usab-fe2 pacemakerd[1296]:  notice: Shutting down Pacemaker
Jan 23 09:27:45 usab-fe2 systemd[1]: Stopping Pacemaker High 
Availability Cluster Manager...
Jan 23 09:27:45 usab-fe2 pacemakerd[1296]:  notice: Stopping 
pacemaker-controld
Jan 23 09:27:45 usab-fe2 pacemaker-controld[1327]:  notice: Caught 
'Terminated' signal
Jan 23 09:27:45 usab-fe2 pacemaker-controld[1327]:  notice: Shutting 
down cluster resource manager
Jan 23 09:27:45 usab-fe2 pacemaker-attrd[1325]:  notice: Setting 
shutdown[usab-fe2]: (unset) -> 1674466065
Jan 23 09:28:45 usab-fe2 pacemaker-execd[1324]:  notice: Giving up on 
Netfilter stop (rc=0): timeout (elapsed=59991ms, remaining=9ms)
Jan 23 09:28:45 usab-fe2 pacemaker-controld[1327]:  error: Result of 
stop operation for Netfilter on usab-fe2: Timed Out
Jan 23 09:28:45 usab-fe2 pacemaker-attrd[1325]:  notice: Setting 
fail-count-Netfilter#stop_0[usab-fe2]: (unset) -> INFINITY
Jan 23 09:28:45 usab-fe2 pacemaker-attrd[1325]:  notice: Setting 
last-failure-Netfilter#stop_0[usab-fe2]: (unset) -> 1674466125
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]:  error: Shutdown 
Escalation just popped in state S_NOT_DC!
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]:  notice: State 
transition S_NOT_DC -> S_STOPPING
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]:  notice: Stopped 0 
recurring operations at shutdown... waiting (2 remaining)
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]:  notice: Recurring 
action FW-VIP-Private:64 (FW-VIP-Private_monitor_20000) incomplete at 
shutdown
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]:  notice: Recurring 
action FW-VIP-Outside:66 (FW-VIP-Outside_monitor_20000) incomplete at 
shutdown
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]:  error: 3 resources 
were active at shutdown
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]:  notice: Disconnected 
from the executor
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]:  notice: Disconnected 
from Corosync
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]:  notice: Disconnected 
from the CIB manager
Jan 23 09:47:45 usab-fe2 systemd[1]: pacemaker.service: Succeeded.
Jan 23 09:47:45 usab-fe2 systemd[1]: Stopped Pacemaker High Availability 
Cluster Manager.
-- Reboot --
Jan 23 09:49:34 usab-fe2 systemd[1]: Started Pacemaker High Availability 
Cluster Manager.

Thank's,

R.
--