[ClusterLabs] [rgmanager] Recovering a failed (but running) server in rgmanager

Jan Pokorný jpokorny at redhat.com
Mon Sep 19 14:30:36 EDT 2016

On 18/09/16 15:37 -0400, Digimer wrote:
>   If, for example, a server's definition file is corrupted while the
> server is running, rgmanager will put the server into a 'failed' state.
> That's fine and fair.

Please, be more precise.  Is it "vm" resource agent that you are talking
about, hence server is the particular virtual machine to be managed?
Is the agent in the role of a service (defined at a top-level) or
a standard resource (without special treatment, possibly with
dependent services further in the group)?

>   The problem is that, once the file is fixed, there appears to be no
> way to go failed -> started without disabling (and thus powering off)
> the VM. This is troublesom because it forces an interruption when the
> service could have been placed under resource management without a reboot.
>   For example, doing 'clusvcadm -e <server>' when the service was
> 'disabled' (say because of a manual boot of the server), rgmanager
> detects that the server is running fine and simply marks the server as
> 'started'. Is there no way to do something similar to go 'failed' ->
> 'started' without the 'disable' step?

In case it's a VM as a service, this could possibly be "exploited"
(never tested that, though):

# MANWIDTH=72 man rgmanager | col -b \
  | sed -n '/^VIRTUAL MACHINE/{:a;p;n;/^\s*$/d;ba}'
>        Apart from what is noted in the VM resource agent, rgman-
>        ager  provides  a  few  convenience features when dealing
>        with virtual machines.
>         * it will use live migration when transferring a virtual
>         machine  to  a  more-preferred  host in the cluster as a
>         consequence of failover domain operation
>         * it will search the other instances of rgmanager in the
>         cluster  in  the  case  that a user accidentally moves a
>         virtual machine using other management tools
>         * unlike services, adding a virtual  machine  to  rgman-
>         ager’s  configuration will not cause the virtual machine
>         to be restarted
>         *  removing   a   virtual   machine   from   rgmanager’s
>         configuration will leave the virtual machine running.

(see the last two items).

>   I tried freezing the service, no luck. I also tried coalescing via
> '-c', but that didn't help either.

Any path from "failed" in the resource (group) life-cycle goes either
through "disabled" or "stopped" if I am not mistaken, so would rather
experiment with adding a new service and dropping the old one per
the above description as a possible workaround (perhaps in the reverse
order so as to retain the same name for the service, indeed unless
rgmanager would actively prevent that anyway -- no idea).

Jan (Poki)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20160919/d597a844/attachment-0003.sig>

More information about the Users mailing list