[Pacemaker] stonith q

Alex Samad - Yieldbroker Alex.Samad at yieldbroker.com
Tue Nov 4 14:45:41 EST 2014


{snip}
> >> Any pointers to a frame work somewhere ?
> >
> > I do not think there is any formal stonith agent developers guide;
> > take at any existing agent like external/ipmi and modify to suite your
> needs.
> >
> >> Does fenced have any handlers, I notice it logs a message in syslog and
> cluster log is there a chance to capture the event there ?
> >
> > I do not have experience with RH CMAN, sorry. But from what I
> > understand fenced and stonithd agents are compatible.
> 
> https://fedorahosted.org/cluster/wiki/FenceAgentAPI


Thanks

> 
> Note the return codes. Also, not listed there, is the requirement that an
> agent print it's XML validation data. You can see example of what this looks
> like by calling 'fence_ipmilan -o metadata' (or any other
> fence_* agent).
> 
> For the record, I think this is a bad idea.

So lots of people have said this is bad idea and maybe I am miss understanding something.

From my observation of my 2 node cluster, when inter cluster comms has an issues 1 node kills the other node. 
Lets say A + B.
A is currently running the resources, B get elected to die.
A signal is sent cman -> PK -> stonithd

From the logs on server B I see fenced trying to kill server B, but I don't use any cman/stonith agents. I would like to capture that event and use a OS reboot.

So the problem I perceive is if server B is in a state where it can't run OS locked up or crashed. I believe VMware will look after that, from experience I have seen it deal with that 

The issue is  if  B is running enough to still have a VIP (one of the resources that PK looks after) is still on B and A and B can't or will not shutdown via the OS. I understand that, but I would like still attempt to reboot at that time

I have found a simpler solution I actively poll to check if the cluster is okay.  I would prefer to fire a script  on an event but ..

I'm also looking into why there is a comms problem as its 2 vm's on the same host on the same network, I think its starvation of cpu cycles as it’s a dev setup.


> 
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is
> trapped in the mind of a person without access to education?


More information about the Pacemaker mailing list