[ClusterLabs] IPaddr2 resource times out and cant be killed

Fri Jul 29 16:02:57 EDT 2022

On Fri, Jul 29, 2022 at 12:52 PM Ross Sponholtz <rsponholtz at hotmail.com> wrote:
>
> I’m running a RHEL pacemaker cluster on Azure, and I’ve gotten a failure & fencing where I get these messages in the log file:
>
>
> warning: vip_ABC_30_monitor_10000 process (PID 1779737) timed out
> crit: vip_ABC_30_monitor_10000 process (PID 1779737) will not die!
>
>
>
> This resource uses the IPAddr2 resource agent.  I’ve looked at the agent code, and I can’t pinpoint any reason it would hang up, and since the node gets fenced, I can’t tell why this happens – any ideas on what kinds of failures could cause this problem?
>
>
>
> Thanks,
>
> Ross
>

Are you able to reproduce this? I suggest adding `trace_ra=1` to the
resource configuration in order to determine where it's hanging.

# pcs resource update vip_ABC trace_ra=1

This will produce a shell trace of each operation in
/var/lib/heartbeat/trace_ra/IPaddr2. This is naturally quite a lot of
logging, so remove the option when you've gotten what you need.

# pcs resource update vip_ABC trace_ra=

Also discussed in this article (you should have access if you're on RHEL):
- How can I determine exactly what is happening with every operation
on a resource in Pacemaker?
(https://access.redhat.com/solutions/3182931)

> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/

-- 
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker