[Pacemaker] Convenience Groups - WAS Re: [Linux-HA] Unordered groups (was Re: Is 'resource_set' still experimental?)

Fri Apr 20 03:15:14 EDT 2012

20.04.2012 03:21, Andrew Beekhof wrote:
> On Fri, Apr 20, 2012 at 7:41 AM, Vladislav Bogdanov
> <bubble at hoster-ok.com> wrote:
>> 19.04.2012 20:48, David Vossel wrote:
>>> ----- Original Message -----
>>>> From: "Alan Robertson" <alanr at unix.sh>
>>>> To: pacemaker at oss.clusterlabs.org, "Andrew Beekhof" <andrew at beekhof.net>
>>>> Cc: "Dejan Muhamedagic" <dejan at hello-penguin.com>
>>>> Sent: Thursday, April 19, 2012 10:22:48 AM
>>>> Subject: [Pacemaker] Convenience Groups - WAS Re: [Linux-HA] Unordered groups (was Re: Is 'resource_set' still
>>>> experimental?)
>>>>
>>>> Hi Andrew,
>>>>
>>>> I'm currently working on a fairly large cluster with lots of
>>>> resources
>>>> related to attached hardware.  There are 59 of these things and 24 of
>>>> those things and so on and each of them has its own resource to deal
>>>> with the the "things".  They are not clones, and can't easily be made
>>>> clones.
>>>>
>>>> I would like to be able to easily say "shut down all the resources
>>>> that
>>>> manage this kind of thing".    The solution that occurs to me most
>>>> obviously is one you would likely call a "double abomination" ;-) -
>>>> an
>>>> unordered and un-colocated group.  It seems a safe assumption that
>>>> this
>>>> would not be a good path to pursue given your statements from last
>>>> year...
>>>>
>>>> What would you suggest instead?
>>>>
>>>
>>> This might be a terrible idea, but this is the first thing that came to mind.
>>>
>>> What if you made a Dummy resource as a sort of control switch for starting/stopping each "group" of resources that control a "thing".  The resource groups wouldn't actually be defined as resource groups, but instead would be defined by order constraints that force a set of resources to start or stop when the Dummy control resource starts/stops.
>>>
>>> So, something like this...
>>>
>>> Dummy resource D1
>>> thing resource T1
>>> thing resource T2
>>>
>>> - If you start D1 then T1 and T2 can start.
>>> - If you stop D1, then T1 and T2 have to stop.
>>> - If you flip D1 back on, then T1 and T2 start again.
>>> order set start (D1) then start (T1 and T2)
>>
>> But, when pacemaker decides to move Dummy to another node, the whole
>> stack will be restarted, even if Dummy is configured with allow_migration.
>>
>> I solved this problem for myself with RA which manages cluster ticket,
>> and other resources depend on that ticket, exploiting it as a cluster
>> attribute.
> 
> I like this approach.

One more sign that I go right way ;)

> Why the resource for granting/revoking the ticket though?

Initially - just to grant the ticket at the cluster start. This goal
will be obsolete once we have persistent tickets (or cluster-wide
attributes). But see below.

> I'd have thought it would be just as easy to manually grant/revoke the
> ticket as it would be to start/stop the fake resource.

My implementation of RA (which was designed before standby/activate
feature appeared) just uses pseudo-resource functionality to report
status at monitor op, so it tolerates manual intervention to tickets.

I'm about rewrite it a bit so it will check ticket existence at monitor,
and return OCF_SUCCESS if it exists, leaving to admin to play with
standby/activate. This is actually very low priority for me though.

I initially wrote that RA to solve one issue with HA lustre
proof-of-concept I'm currently working on, but then I realized that it
can be very useful to easily manage "stacks" of resources. F.e. I have
several tickets: drbd-local, drbd-stacked, drbd-testfs-local,
drbd-testfs-stacked, lustre and testfs. They are granted by Ticketer RA
at cluster start, and other resources depend on that tickets with
rsc_ticket (including f.e. drbd-stacked resource on drbd-local ticket
and so on), allowing me to:
* easily unmount all lustre parts for given fs (testfs), still leaving
stacked drbd resources in Master state, so I can do something with them
* unmount all lustre filesystems in one shot
* do the above plus demote all stacked resources if I need to manage
"local" drbd resources
* stop everything drbd-related in one shot
and much more.

I was asked by my employer to present my results with lustre at LinuxCon
in Barcelona this November, so hopefully I'll make interesting
presentation (including this trick with tickets) if everything goes
smooth with that. Everybody are welcome.

One more interesting "feature" (actually side-effect) with this RA: when
you stop Ticketer resource, current transition is aborted because ticket
is revoked. This allows me to guarantee that advisory ordering between
"higher" resources always works (with one more trick - extra advisory
ordering constraints between several Ticketer resources, this will also
be included in presentation).

I attach that RA here, so you're free to include it in pacemaker after
review.

Best,
Vladislav

-------------- next part --------------
#!/bin/bash
#
# Resource agent which manages named cluster ticket.
# nodes.
#
# Copyright 2011 Vladislav Bogdanov
#
#       usage: $0 {start|stop|status|monitor|migrate_from|migrate_to|meta-data|validate-all}
#
#######################################################################
# Initialization:
: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs

LC_ALL="C"
LANG="C"

#######################################################################

usage() {
  echo "usage: $0 {start|stop|monitor|migrate_from|migrate_to|meta-data|validate-all}"
}

meta_data() {
        cat <<EOF
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="Ticketer">
<version>1.0</version>

<longdesc lang="en">
Ticketer resource agent grants named ticket (rsc_ticket) to pacemaker CIB
on start and removes it on stop.
As some resources may depend on that ticket (what can it be for otherwise?),
expected use case for resource managed by this RA is to be migratable
(meta allow-migrate="true").
This way dependent resources stay started even if Ticketer resource migrates
between cluster nodes.
It is design decision that this RA do not check ticket state on monitor
operation, because its sole goal is to fire up some resources which depend
on granted ticket, and which manage that ticket by themselves after that.
</longdesc>
<shortdesc lang="en">Manages cluster tickets</shortdesc>

<parameters>

<parameter name="name" unique="1" required="1">
<longdesc lang="en">
Name of cluster ticket.
</longdesc>
<shortdesc lang="en">Ticket name</shortdesc>
<content type="string" default="" />
</parameter>

</parameters>

<actions>
<action name="start" timeout="30" />
<action name="stop" timeout="30" />
<action name="monitor" depth="0" timeout="10" interval="10" />
<action name="migrate_from" timeout="30" />
<action name="migrate_to" timeout="30" />
<action name="meta-data" timeout="5" />
<action name="validate-all" timeout="5" />
</actions>
</resource-agent>
EOF
}

Ticket_Grant()
{
    local rc
    local standby

    ocf_log info "Starting ticket ${OCF_RESKEY_name} management"
    ha_pseudo_resource ${ha_pseudo_resource_name} start
    rc=$?

    if [ ${rc} -eq $OCF_SUCCESS ] ; then
        ocf_log info "Granting ticket ${OCF_RESKEY_name}"
        crm_ticket -t "${OCF_RESKEY_name}" -g --force
        standby=$( crm_ticket -t "${OCF_RESKEY_name}" -G standby )
        if [ "${standby}" = "true" ] ; then
            ocf_log info "Activating ticket ${OCF_RESKEY_name}"
            crm_ticket -t "${OCF_RESKEY_name}" -a
        fi
        rc=$?
        if [ ${rc} -eq 0 ] ; then
            rc=$OCF_SUCCESS
        else
            rc=$OCF_ERR_GENERIC
        fi
    fi

    return ${rc}
}

Ticket_Delete()
{
    local rc

    ha_pseudo_resource ${ha_pseudo_resource_name} monitor
    rc=$?
    if [ ${rc} -eq $OCF_SUCCESS ] ; then
        ocf_log info "Stopping ticket ${OCF_RESKEY_name} management"
        ha_pseudo_resource ${ha_pseudo_resource_name} stop
        ocf_log info "Revoking ticket ${OCF_RESKEY_name}"
        crm_ticket -t "${OCF_RESKEY_name}" -r --force >/dev/null 2>&1
        rc=$?
        if [ ${rc} -eq 0 ] ; then
            rc=$OCF_SUCCESS
        else
            rc=$OCF_ERR_GENERIC
        fi
    else
        rc=$OCF_SUCCESS
    fi

    return ${rc}
}

Ticket_Migrate_From()
{
    local rc

    ha_pseudo_resource ${ha_pseudo_resource_name} monitor
    rc=$?

    if [ ${rc} -eq $OCF_SUCCESS ] ; then
        ocf_log info "Unable to migrate ticket ${OCF_RESKEY_name} management from ${OCF_RESKEY_CRM_meta_migrate_source}: already active"
        rc=$OCF_ERR_GENERIC
    else
        ocf_log info "Starting ticket ${OCF_RESKEY_name} management due to migration from ${OCF_RESKEY_CRM_meta_migrate_source}"
        ha_pseudo_resource ${ha_pseudo_resource_name} start
        rc=$?
    fi

    return ${rc}
}

Ticket_Migrate_To()
{
    local rc

    ha_pseudo_resource ${ha_pseudo_resource_name} monitor
    rc=$?
    if [ ${rc} -eq $OCF_SUCCESS ] ; then
        ocf_log info "Stopping ticket ${OCF_RESKEY_name} management due to migration to ${OCF_RESKEY_CRM_meta_migrate_target}"
        ha_pseudo_resource ${ha_pseudo_resource_name} stop
        rc=$?
    else
        ocf_log info "Unable to migrate ticket ${OCF_RESKEY_name} management to ${OCF_RESKEY_CRM_meta_migrate_target}: not active locally"
        rc=$OCF_ERR_GENERIC
    fi

    return ${rc}
}

Ticket_Monitor()
{
    local rc

    ha_pseudo_resource ${ha_pseudo_resource_name} monitor
    rc=$?

    return ${rc}
}

Ticket_Validate_All()
{
    if [ -z $OCF_RESKEY_name ]; then
        ocf_log err "Missing configuration parameter \"name\"."
        return $OCF_ERR_CONFIGURED
    fi

    check_binary crm_ticket

    return $OCF_SUCCESS
}

if [ $# -ne 1 ] ; then
    usage
    exit $OCF_ERR_ARGS
fi

: ${ha_pseudo_resource_name:=Ticketer-${OCF_RESOURCE_INSTANCE}}

case $1 in
  meta-data)
      meta_data
      exit $OCF_SUCCESS
      ;;
  usage)
      usage
      exit $OCF_SUCCESS
      ;;
esac

# Everything except usage and meta-data must pass the validate test
Ticket_Validate_All || exit $?

case $1 in
    start)
        Ticket_Grant
        ret=$?
        ;;
    stop)
        Ticket_Delete
        ret=$?
        ;;
    migrate_from)
        Ticket_Migrate_From
        ret=$?
        ;;
    migrate_to)
        Ticket_Migrate_To
        ret=$?
        ;;
    monitor)
        Ticket_Monitor
        ret=$?
        ;;
    validate-all)
        ;;
    *)
        usage
        exit $OCF_ERR_UNIMPLEMENTED
        ;;
esac
exit $ret