[ClusterLabs] Antw: Locate resource with functioning member of clone set?

Fri Nov 18 07:22:44 UTC 2016

>>> Israel Brewster <israel at ravnalaska.net> schrieb am 17.11.2016 um 18:37 in
Nachricht <751F1BD6-8434-4AD9-B77F-10EDDFE28E31 at ravnalaska.net>:
> I have a resource that is set up as a clone set across my cluster, partly for 
> pseudo-load balancing (If someone wants to perform an action that will take a 
> lot of resources, I can have them do it on a different node than the primary 
> one), but also simply because the resource can take several seconds to start, 
> and by having it already running as a clone set, I can failover in the time 
> it takes to move an IP resource - essentially zero down time.
> 
> This is all well and good, but I ran into a problem the other day where the 
> process on one of the nodes stopped working properly. Pacemaker caught the 
> issue, and tried to fix it by restarting the resource, but was unable to 
> because the old instance hadn't actually exited completely and was still 
> tying up the TCP port, thereby preventing the new instance that pacemaker 
> launched from being able to start.
> 
> So this leaves me with two questions: 
> 
> 1) is there a way to set up a "kill script", such that before trying to 
> launch a new copy of a process, pacemaker will run this script, which would 
> be responsible for making sure that there are no other instances of the 
> process running?
> 2) Even in the above situation, where pacemaker couldn't launch a good copy 
> of the resource on the one node, the situation could have been easily 
> "resolved" by pacemaker moving the virtual IP resource to another node where 
> the cloned resource was running correctly, and notifying me of the problem. I 
> know how to make colocation constraints in general, but how do I do a 
> colocation constraint with a cloned resource where I just need the virtual IP 
> running on *any* node where there clone is working properly? Or is it the 
> same as any other colocation resource, and pacemaker is simply smart enough 
> to both try to restart the failed resource and move the virtual IP resource 
> at the same time?

I wonder: Wouldn't a monitor operation that reports the resource as running as long as the port is occupied resolve both issues?

> 
> As an addendum to question 2, I'd be interested in any methods there may be 
> to be notified of changes in the cluster state, specifically things like when 
> a resource fails on a node - my current nagios/icinga setup doesn't catch that 
> when pacemaker properly moves the resource to a different node, because the 
> resource remains up (which, of course, is the whole point), but it would 
> still be good to know something happened so I could look into it and see if 
> something needs fixed on the failed node to allow the resource to run there 
> properly.
> 
> Thanks!
> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------