[Pacemaker] continue starting chain with failed group resources

Wed Dec 15 10:56:02 EST 2010

On Tue, Dec 14, 2010 at 06:18:16PM -0700, Patrick H. wrote:
> 
> 
> Sent: Tue Dec 14 2010 11:37:06 GMT-0700 (Mountain Standard Time)
> From: Dejan Muhamedagic <dejanmm at fastmail.fm>
> To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> Subject: Re: [Pacemaker] continue starting chain with failed group
> resources
> >Hi,
> >
> >On Mon, Dec 13, 2010 at 10:43:36PM -0700, Patrick H. wrote:
> >>After tinkering with this for a few hours I finally have something working.
> >>
> >>colocation co-raid inf: ( md_raid iscsi_1 iscsi_2 iscsi_3 )
> >
> >This should be noop. You'd want something like this, I think:
> >
> >colocation co-raid inf: md_raid ( iscsi_1 iscsi_2 iscsi_3 )
> >
> No, that makes the md_raid service depend on all the iscsi services
> being started, which I dont want

Yes, of course. It's just that in the given context, that seems
to be the only sensible relation between the resources.

> >>order or-raid 0: ( iscsi_1 iscsi_2 iscsi_3 ) md_raid
> >>
> >>Got rid of the group, changed the score on the order to 0, and
> >>changed the grouping of both the colocation and order. This
> >>*appears* to function as intended, but if anyone can point out any
> >>pitfalls I'd appreciate it
> >>
> >>-Patrick
> >>
> >>Sent: Mon Dec 13 2010 21:12:04 GMT-0700 (Mountain Standard Time)
> >>From: Patrick H. <pacemaker at feystorm.net>
> >>To: The Pacemaker cluster resource manager <pacemaker at oss.clusterlabs.org>
> >>Subject: [Pacemaker] continue starting chain with failed group resources
> >>>Is there a way to continue down a chain of starting resources once
> >>>a previous resource hast tried to start, no matter if the try was
> >>>successful or not?
> >
> >No, that's currently not possible to express. I think that you
> >should take the iSCSI resources out of the cluster and let them
> >start on boot _before_ the cluster manager. If there are not
> >enough disks, then the md_raid resource is going to fail.
> Cant do that either. When the node that is currently using the iscsi
> services fails, they have to be migrated over to another host so it
> can assemble them into a raid array. If theyre not being managed by
> pacemaker, they wont migrate.

Perhaps you can then set on_fail=fence for say a filesystem which
is on top of this md_raid.

> I made a few more tweaks from the configuration I posted earlier and
> it seems to work pretty good with only one exception.
> colocation co-raid inf: ( md_raid iscsi_1 iscsi_2 iscsi_3 )

If this collocation makes a difference, then I really don't know
what it is.

> order or-raid_start 0: ( iscsi_1:start iscsi_2:start iscsi_3:start )
> md_raid:start
> order or-raid_stop inf: md_raid:stop ( iscsi_1:stop iscsi_2:stop
> iscsi_3:stop )
> 
> That makes it so that when they start up, they start in order, but
> it isnt required that every iscsi start before md_raid, just that
> they try to start

That's not how advisory order is defined, i.e. it has an effect
only in case both resources are to be started or stopped. For
instance, if all iscsi resources fail, the md_raid one would
continue to run. See Configuration Explained or Ordering
Explained doc.

> Then when they stop, its manditory that they stop in that order so
> that no iscsi service will stop while md_raid is still running.
> 
> The exception I mentioned is a bug in the policy engine. Bug 2435.
> The policy engine allows resources within a colocation set to start
> on other nodes. So if I were to stop one of the iscsi services, and
> then start it again, it might start on a different node. Unless this
> bug gets fixed soon, I'll probably modify the iscsi script so that

That bug is in the state fixed. If you think it's not fixed, then
you should reopen it.

> all the iscsi devices are under 1 resource.

Yes, that may be one option. Probably not too difficult to modify
the RA.

Thanks,

Dejan

> >Thanks,
> >
> >Dejan
> >
> >>>I've got 3 iSCSI resources which are in a group, and then an md
> >>>raid-5 array as another resource. I have the raid array resource
> >>>set to start after the group with a colocation rule, but it will
> >>>only start if the whole group comes up. Since this is raid-5, we
> >>>can obviously handle some disk failure and start up anyway. So how
> >>>do I get it to try to start it up once all the iSCSI resources
> >>>have tried to start? Went looking through the docs and didnt find
> >>>anything.
> >>>
> >>>Note: there will be other resources in the chain (like mounting
> >>>the filesystem) that I dont want to try and start if the raid
> >>>array resource didnt start.
> >>>------------------------------------------------------------------------
> >>>
> >>>_______________________________________________
> >>>Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>>Project Home: http://www.clusterlabs.org
> >>>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> >>_______________________________________________
> >>Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >>Project Home: http://www.clusterlabs.org
> >>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> >
> >_______________________________________________
> >Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> >http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> >Project Home: http://www.clusterlabs.org
> >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker