[ClusterLabs] Q: "crmd[7281]: warning: new_event_notification (7281-97955-15): Broken pipe (32)" as response to resource cleanup

Mon Aug 12 19:03:51 EDT 2019

On Mon, 2019-08-12 at 17:46 +0200, Ulrich Windl wrote:
> Hi!
> 
> I just noticed that a "crm resource cleanup <rsc>" caused some
> unexpected behavior and the syslog message:
> crmd[7281]:  warning: new_event_notification (7281-97955-15): Broken
> pipe (32)
> 
> It's SLES14 SP4 last updated Sept. 2018 (up since then, pacemaker-
> 1.1.19+20180928.0d2680780-1.8.x86_64).
> 
> The cleanup was due to a failed monitor. As an unexpected consequence
> of this cleanup, CRM seemed to restart the complete resource (and
> dependencies), even though it was running.

I assume the monitor failure was old, and recovery had already
completed? If not, recovery might have been initiated before the clean-
up was recorded.

> I noticed that a manual "crm_resource -C -r <rsc> -N <node>" command
> has the same effect (multiple resources are "Cleaned up", resources
> are restarted seemingly before the "probe" is done.).

Can you verify whether the probes were done? The DC should log a
message when each <rsc>_monitor_0 result comes in.

> Actually the manual says when cleaning up a single primitive, the
> whole group is cleaned up, unless using --force. Well ,I don't like
> this default, as I expect any status change from probe would
> propagate to the group anyway...

In 1.1, clean-up always wipes the history of the affected resources,
regardless of whether the history is for success or failure. That means
all the cleaned resources will be reprobed. In 2.0, clean-up by default
wipes the history only if there's a failed action (--refresh/-R is
required to get the 1.1 behavior). That lessens the impact of the
"default to whole group" behavior.

I think the original idea was that a group indicates that the resources
are closely related, so changing the status of one member might affect
what status the others report.

> Regards,
> Ulrich
-- 
Ken Gaillot <kgaillot at redhat.com>