[Pacemaker] Orphaned resources

Sun Jul 29 22:52:25 EDT 2012

On Thu, Jul 12, 2012 at 12:48 AM, Thilo Uttendorfer
<t.uttendorfer at linux-ag.com> wrote:
> Hi,
>
> on several pacemaker clusters I sometimes see ORPHANED (DRBD) resources. In
> some cases they exist only for a short time and are automatically removed.
> But in other cases the resource will fail. "crm status" then looks like that:
>
>  Master/Slave Set: ms-res1 [drbd-res1]
>      drbd-res1:0     (ocf::linbit:drbd):     Master server-v2.linux-ag.com
> (unmanaged) FAILED
>      drbd-res1:2     (ocf::linbit:drbd):      ORPHANED Master
> server-v2.linux-ag.com (unmanaged) FAILED
>      Slaves: [ server-v1.linux-ag.com ]
>
> This happens although DRBD is running fine without any problems.
> The pacemaker log shows entries like this:
>
> Apr 25 20:07:46 server-v2 pengine: [21702]: notice: LogActions: Leave
> drbd-res1:0#011(Master server-v2.linux-ag.com)
> Apr 25 20:07:46 server-v2 pengine: [21702]: notice: LogActions: Start
> drbd-res1:1#011(server-v1.linux-ag.com)
> Apr 25 20:07:46 server-v2 pengine: [21702]: notice: LogActions: Stop
> drbd-res1:2#011(server-v2.linux-ag.com)
>
> Apr 25 20:07:48 server-v2 pengine: [21702]: notice: LogActions: Leave
> drbd-res1:0#011(Master server-v2.linux-ag.com)
> Apr 25 20:07:48 server-v2 pengine: [21702]: notice: LogActions: Leave
> drbd-res1:1#011(Slave server-v1.linux-ag.com)
> Apr 25 20:07:48 server-v2 pengine: [21702]: notice: LogActions: Stop
> drbd-res1:2#011(server-v2.linux-ag.com)
>
>
> In what cases are ORPHANS created? How are they usually handled or how can I
> get rid of them? How could I debug this situation, e.g. which log entires are
> of interest?

Orphans are created when there are more instances than the are configured.
Ie. if clone-max=3 but we found it active on 4 nodes, then we would
create an "orphan" that would then be stopped.

It happens more often for anonymous clones which can be known by many
different names on each node (ie. xxx:0, xxx:1, xxx:..N) due to some
internal processing.
For this reason we have actually dropped the :N suffix from anonymous
clones for 1.1.8
This should make things more reliable/obvious in the future.

>
> I suppose that these ORPHANS (sometimes) appear after a cluster node was
> offline (or in maintenance mode) and then rejoins the cluster. But not too
> sure about that...
>
>
> The clusters are based on Ubuntu 10.04 LTS with
>  - heartbeat 3.0.5
>  - pacemaker 1.1.6
>
>
>
> Thanks for any hint,
> Thilo
>
>
> --
> Thilo Uttendorfer
> Linux Information Systems AG
> Putzbrunner Str. 71, 81739 München
>
> Fon: +49 89 993412-11, Fax: +49 89 993412-99
> t.uttendorfer at linux-ag.com, http://www.linux-ag.com
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org