[ClusterLabs] Help required for N+1 redundancy setup

Fri Jan 8 11:00:55 EST 2016

On 01/08/2016 06:55 AM, Nikhil Utane wrote:
> Would like to validate my final config.
> 
> As I mentioned earlier, I will be having (upto) 5 active servers and 1
> standby server.
> The standby server should take up the role of active that went down. Each
> active has some unique configuration that needs to be preserved.
> 
> 1) So I will create total 5 groups. Each group has a "heartbeat::IPaddr2
> resource (for virtual IP) and my custom resource.
> 2) The virtual IP needs to be read inside my custom OCF agent, so I will
> make use of attribute reference and point to the value of IPaddr2 inside my
> custom resource to avoid duplication.
> 3) I will then configure location constraint to run the group resource on 5
> active nodes with higher score and lesser score on standby.
> For e.g.
> Group              Node            Score
> ---------------------------------------------
> MyGroup1        node1           500
> MyGroup1        node6           0
> 
> MyGroup2        node2           500
> MyGroup2        node6           0
> ..
> MyGroup5        node5           500
> MyGroup5        node6           0
> 
> 4) Now if say node1 were to go down, then stop action on node1 will first
> get called. Haven't decided if I need to do anything specific here.

To clarify, if node1 goes down intentionally (e.g. standby or stop),
then all resources on it will be stopped first. But if node1 becomes
unavailable (e.g. crash or communication outage), it will get fenced.

> 5) But when the start action of node 6 gets called, then using crm command
> line interface, I will modify the above config to swap node 1 and node 6.
> i.e.
> MyGroup1        node6           500
> MyGroup1        node1           0
> 
> MyGroup2        node2           500
> MyGroup2        node1           0
> 
> 6) To do the above, I need the newly active and newly standby node names to
> be passed to my start action. What's the best way to get this information
> inside my OCF agent?

Modifying the configuration from within an agent is dangerous -- too
much potential for feedback loops between pacemaker and the agent.

I think stickiness will do what you want here. Set a stickiness higher
than the original node's preference, and the resource will want to stay
where it is.

> 7) Apart from node name, there will be other information which I plan to
> pass by making use of node attributes. What's the best way to get this
> information inside my OCF agent? Use crm command to query?

Any of the command-line interfaces for doing so should be fine, but I'd
recommend using one of the lower-level tools (crm_attribute or
attrd_updater) so you don't have a dependency on a higher-level tool
that may not always be installed.

> Thank You.
> 
> On Tue, Dec 22, 2015 at 9:44 PM, Nikhil Utane <nikhil.subscribed at gmail.com>
> wrote:
> 
>> Thanks to you Ken for giving all the pointers.
>> Yes, I can use service start/stop which should be a lot simpler. Thanks
>> again. :)
>>
>> On Tue, Dec 22, 2015 at 9:29 PM, Ken Gaillot <kgaillot at redhat.com> wrote:
>>
>>> On 12/22/2015 12:17 AM, Nikhil Utane wrote:
>>>> I have prepared a write-up explaining my requirements and current
>>> solution
>>>> that I am proposing based on my understanding so far.
>>>> Kindly let me know if what I am proposing is good or there is a better
>>> way
>>>> to achieve the same.
>>>>
>>>>
>>> https://drive.google.com/file/d/0B0zPvL-Tp-JSTEJpcUFTanhsNzQ/view?usp=sharing
>>>>
>>>> Let me know if you face any issue in accessing the above link. Thanks.
>>>
>>> This looks great. Very well thought-out.
>>>
>>> One comment:
>>>
>>> "8. In the event of any failover, the standby node will get notified
>>> through an event and it will execute a script that will read the
>>> configuration specific to the node that went down (again using
>>> crm_attribute) and become active."
>>>
>>> It may not be necessary to use the notifications for this. Pacemaker
>>> will call your resource agent with the "start" action on the standby
>>> node, after ensuring it is stopped on the previous node. Hopefully the
>>> resource agent's start action has (or can have, with configuration
>>> options) all the information you need.
>>>
>>> If you do end up needing notifications, be aware that the feature will
>>> be disabled by default in the 1.1.14 release, because changes in syntax
>>> are expected in further development. You can define a compile-time
>>> constant to enable them.
>>>
>>>
>