[ClusterLabs] Antw: Locate resource with functioning member of clone set?
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Fri Nov 18 07:22:44 UTC 2016
>>> Israel Brewster <israel at ravnalaska.net> schrieb am 17.11.2016 um 18:37 in
Nachricht <751F1BD6-8434-4AD9-B77F-10EDDFE28E31 at ravnalaska.net>:
> I have a resource that is set up as a clone set across my cluster, partly for
> pseudo-load balancing (If someone wants to perform an action that will take a
> lot of resources, I can have them do it on a different node than the primary
> one), but also simply because the resource can take several seconds to start,
> and by having it already running as a clone set, I can failover in the time
> it takes to move an IP resource - essentially zero down time.
>
> This is all well and good, but I ran into a problem the other day where the
> process on one of the nodes stopped working properly. Pacemaker caught the
> issue, and tried to fix it by restarting the resource, but was unable to
> because the old instance hadn't actually exited completely and was still
> tying up the TCP port, thereby preventing the new instance that pacemaker
> launched from being able to start.
>
> So this leaves me with two questions:
>
> 1) is there a way to set up a "kill script", such that before trying to
> launch a new copy of a process, pacemaker will run this script, which would
> be responsible for making sure that there are no other instances of the
> process running?
> 2) Even in the above situation, where pacemaker couldn't launch a good copy
> of the resource on the one node, the situation could have been easily
> "resolved" by pacemaker moving the virtual IP resource to another node where
> the cloned resource was running correctly, and notifying me of the problem. I
> know how to make colocation constraints in general, but how do I do a
> colocation constraint with a cloned resource where I just need the virtual IP
> running on *any* node where there clone is working properly? Or is it the
> same as any other colocation resource, and pacemaker is simply smart enough
> to both try to restart the failed resource and move the virtual IP resource
> at the same time?
I wonder: Wouldn't a monitor operation that reports the resource as running as long as the port is occupied resolve both issues?
>
> As an addendum to question 2, I'd be interested in any methods there may be
> to be notified of changes in the cluster state, specifically things like when
> a resource fails on a node - my current nagios/icinga setup doesn't catch that
> when pacemaker properly moves the resource to a different node, because the
> resource remains up (which, of course, is the whole point), but it would
> still be good to know something happened so I could look into it and see if
> something needs fixed on the failed node to allow the resource to run there
> properly.
>
> Thanks!
> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------
More information about the Users
mailing list