[Pacemaker] RFC: What part of the XML configuration do you hate the most?

Andrew Beekhof beekhof at gmail.com
Mon Jul 28 02:48:32 EDT 2008


On Jul 25, 2008, at 9:03 AM, Satomi Taniguchi wrote:

> Hi,
>
> Andrew Beekhof wrote:
> (snip)
>> One thing we used to do (but had to disable because we couldn't get  
>> it 100% right at the time) was move off the healthy resources  
>> before shooting the node.  I think resurrecting this feature is a  
>> better approach.
> (snip)
>
> Thank you for a wonderful idea.
> I'm trying to implement this approach.
>
> To realize this, I added a new setting, on_fail="standby".
> And added a graph "standby" in pengine.
> When a operation is failed, which setting is on_fail="standby",
> crmd(tengine) tries to stop all resources and set the node's cib  
> information to standby.
> Please see the attached files.
> (I referred to set_standby() function in Pacemaker.
> Thanks a lot, Andrew!)
>
> If the stop operation's setting is on_fail="fence",
> STONITH is done when it is failed to set a node standby.
> And when Split-Brain occurs, STONITH is done too,
> because this improvement is only connection with "on_fail".
>
> I would like to hear your opinion about this.

Two problems...

The first is that standby happens after the fencing event, so it's not  
really doing anything to migrate the healthy resources.
The second is that it doesn't address the reason why the feature was  
disabled in the first place.

Consider that NodeX which needs to be fenced because of a failure by  
RscY.

Everything works fine until RscY depends on (healthy) RscZ... you  
can't stop RscZ until RscY is stopped, RscY can't be considered  
stopped until NodeX is fenced but you want to move healthy resources  
(RscZ) away from NodeX before you fence it.

Thus you have a loop that cannot be resolved:

  Stop RscZ -(depends on)-> Stop RscY  -(depends on)-> Stonith NodeX  - 
(depends on)-> Stop RscZ  -(depends on)-> ...

One either needs to prevent the PE from creating the loop in the first  
place, or teach it where and how to break it.
See the extensive comment in native_stop_constraints()





More information about the Pacemaker mailing list