[ClusterLabs] Very long timeout shutting down a server with systemd resource
Roberto Ferrari
rferrari at mbigroup.it
Tue Jan 24 04:40:45 EST 2023
On 1/23/23 19:05, Reid Wahl wrote:
> On Mon, Jan 23, 2023 at 9:59 AM Roberto Ferrari <rferrari at mbigroup.it> wrote:
>>
>> On 23/01/23 18:25, Reid Wahl wrote:
>>> On Mon, Jan 23, 2023 at 7:51 AM Roberto Ferrari <rferrari at mbigroup.it> wrote:
>>>>
>>>> Hello everybody,
>>>> I'd like to understand a strange behavior of a cluster of mine with,
>>>> basically, some IPAddr resource and a systemd resource that deals with
>>>> netfilter-persistent.
>>>> Here the configuration:
>>>>
>>>> primitive FW-VIP-Outside IPaddr2 \
>>>> params ip=192.168.26.74 cidr_netmask=24 nic=outside arp_bg=true \
>>>> op monitor interval=20s timeout=20s
>>>> primitive FW-VIP-Private IPaddr2 \
>>>> params ip=192.168.104.100 cidr_netmask=24 nic=private arp_bg=true \
>>>> op monitor interval=20s timeout=20s
>>>> primitive Netfilter systemd:netfilter-persistent \
>>>> op start interval=0 timeout=60 \
>>>> op stop interval=0 timeout=60
>>>> group FW-VIPs FW-VIP-Private FW-VIP-Outside Netfilter
>>>> The active node, when I reboot the server, hangs shutting down for many
>>>> minutes writing:
>>>>
>>>> A stop job is running for Pacemaker High Availability Cluster Manager (
>>>> 11 s / 30 min). (where 11 is the number of seconds already passed)
>>>>
>>>> Obviously switching to another master is immediate and performing
>>>> syetmctl stop netfilter-persistent is immediate too.
>>>>
>>>> Do you have any hint on what goes wrong with this? I cannot find
>>>> anything strange in the logs.
>>>>
>>>> Thanks a lot,
>>>>
>>>> Roberto.
>>>
>>> Is the netfilter systemd unit enabled outside pacemaker? Run
>>> `systemctl is-enabled netfilter-persistent` to find out, and run
>>> `systemctl disable netfilter-persistent` to disable it if it's
>>> enabled. Only Pacemaker should start or stop netfilter.
>>>
>>>>
>>>> --
>>>> _______________________________________________
>>>> Manage your subscription:
>>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>>
>>>> ClusterLabs home: https://www.clusterlabs.org/
>>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Reid Wahl (He/Him)
>>> Senior Software Engineer, Red Hat
>>> RHEL High Availability - Pacemaker
>>>
>>> _______________________________________________
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>
>> Thank's a lot Reid,
>> Unfortunately it wasn't my case, netfilter-persistent seemed to be
>> disabled at boot.
>> Cheers,
>>
>> R.
>
> Can you share the pacemaker logs from the shutdown period? That will
> probably give some idea of what it's waiting on.
>
>> --
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
>
Here you are:
Jan 23 09:21:33 usab-fe2 pacemaker-controld[1327]: notice: Result of
start operation for Netfilter on usab-fe2: 0 (ok)
Jan 23 09:27:45 usab-fe2 pacemakerd[1296]: notice: Caught 'Terminated'
signal
Jan 23 09:27:45 usab-fe2 pacemakerd[1296]: notice: Shutting down Pacemaker
Jan 23 09:27:45 usab-fe2 systemd[1]: Stopping Pacemaker High
Availability Cluster Manager...
Jan 23 09:27:45 usab-fe2 pacemakerd[1296]: notice: Stopping
pacemaker-controld
Jan 23 09:27:45 usab-fe2 pacemaker-controld[1327]: notice: Caught
'Terminated' signal
Jan 23 09:27:45 usab-fe2 pacemaker-controld[1327]: notice: Shutting
down cluster resource manager
Jan 23 09:27:45 usab-fe2 pacemaker-attrd[1325]: notice: Setting
shutdown[usab-fe2]: (unset) -> 1674466065
Jan 23 09:28:45 usab-fe2 pacemaker-execd[1324]: notice: Giving up on
Netfilter stop (rc=0): timeout (elapsed=59991ms, remaining=9ms)
Jan 23 09:28:45 usab-fe2 pacemaker-controld[1327]: error: Result of
stop operation for Netfilter on usab-fe2: Timed Out
Jan 23 09:28:45 usab-fe2 pacemaker-attrd[1325]: notice: Setting
fail-count-Netfilter#stop_0[usab-fe2]: (unset) -> INFINITY
Jan 23 09:28:45 usab-fe2 pacemaker-attrd[1325]: notice: Setting
last-failure-Netfilter#stop_0[usab-fe2]: (unset) -> 1674466125
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: error: Shutdown
Escalation just popped in state S_NOT_DC!
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: State
transition S_NOT_DC -> S_STOPPING
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Stopped 0
recurring operations at shutdown... waiting (2 remaining)
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Recurring
action FW-VIP-Private:64 (FW-VIP-Private_monitor_20000) incomplete at
shutdown
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Recurring
action FW-VIP-Outside:66 (FW-VIP-Outside_monitor_20000) incomplete at
shutdown
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: error: 3 resources
were active at shutdown
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Disconnected
from the executor
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Disconnected
from Corosync
Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Disconnected
from the CIB manager
Jan 23 09:47:45 usab-fe2 systemd[1]: pacemaker.service: Succeeded.
Jan 23 09:47:45 usab-fe2 systemd[1]: Stopped Pacemaker High Availability
Cluster Manager.
-- Reboot --
Jan 23 09:49:34 usab-fe2 systemd[1]: Started Pacemaker High Availability
Cluster Manager.
Thank's,
R.
--
More information about the Users
mailing list