[Pacemaker] WARN: ..... unmanaged failed resources cannot prevent clone shutdown

Mon Jul 4 18:24:06 EDT 2011

On Fri, Jul 1, 2011 at 9:23 PM, Andreas Kurz <andreas.kurz at linbit.com> wrote:
> Hello,
>
> In a cluster without stonith enabled (yes I know ....) the monitor
> failure of one resource followed by the stop failure of a dependent
> resource lead to a cascade of errors especially because the cluster did
> not stop the shutdown sequence on stop (timeout) failures:
>
> WARN: should_dump_input: Ignoring requirement that
> resource_fs_home_stop_0 comeplete before ms_drbd_home_demote_0:
> unmanaged failed resources cannot prevent clone shutdown
>
> ... and that is really ugly in a DRBD Environment, because demote/stop
> will not work when the DRBD device is in use -- so in this case this
> order requirement on stop must not be ignored.

Did you ask the cluster to shut down before or after the first resource failed?

> The result were a lot of unmanaged resources and the cluster even tried
> to promote the MS resource on the other node although the second
> instance was neither demoted nor stopped.

We seem to loose either way.
If we have the cluster block people complain shutdown takes too long.

Basically at the point a resource fails and stonith is not configured
- shutdown is best-effort.

>
> Is there any possibility to tune this behavior?
>
> Even with stonith enabled the cluster would first migrate all resources
> that don't depend on the unmanaged(failed) resource away before
> executing the stonith, am I right?
>
> thx & Regards,
> Andreas
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>