[ClusterLabs] Master-Slaver resource Restarted after configuration change

Tue Jun 28 18:14:40 EDT 2016

On 06/28/2016 04:10 PM, Ilia Sokolinski wrote:
> Feri,
> 
>  
> 
> I have a follow-up question:
> 
> We are trying to achieve an NDU for our Master/Slave resource - this is
> why the resource parameter is being updated.
> The desired sequence of events is as follows:
> 
>  1. Resource parameter is updated in pacemaker.
>  2. Slave instance is reloaded (or restarted) to pickup the new
>     parameter. This may take several minutes with out resource.
>  3. Once Slave is back online, Master is demoted and Slave is promoted.
>  4. Former master is reloaded (or restarted) to pickup the new parameter.
> 
>  
> 
> This seems to me as a fairly standard sequence for updating any
> Master/Slave resource - restarting both instances at the same time is
> never a good thing.
> 
>  
> 
> I can't come up with a way to create such a sequence.
> 
>  
> 
> The best I can come up with is to implement a reload action in our
> resource agent, which
>  for the slave instance would restart it immediately, and
>  for the master instance would wait a fixed time (3 min) to give Slave
> time to reload, and then return OCF_ERR_GENERIC to force a failover.
> 
>  
> 
> This kind of works, but is quite hacky - there is no guarantee that the
> Slave will be done reloading in 3 min, and we also returning error
> intentionally.
> I thought of using notify action, but there is no notify about the
> reload action, so it does not work.
> 
>  
> 
> Any other suggestions?

I'm not sure there's a way to do this.

If a (non-reloadable) parameter changes, the entire clone does need a
restart, so the cluster will want all instances to be stopped, before
proceeding to start them all again.

Your desired behavior couldn't be the default, because not all services
would be able to function correctly with a running master using
different configuration options than running slaves. In fact, I think it
would be rare; consider a typical option for a TCP port -- changing the
port in only the slaves would break communication with the master and
potentially lead to data inconsistency.

Can you give an example of an option that could be handled this way
without causing problems?

Reload could be a way around this, but not in the way you suggest. If
your service really does need to restart after the option change, then
reload is not appropriate. However, if you can approach the problem on
the application side, and make it able to accept the change without
restarting, then you could implement it as a reload in the agent.

> Thanks a lot
> 
>  
> 
> Ilia
> 
>> On Jun 10, 2016, at 3:13 AM, Ferenc Wágner <wferi at niif.hu
>> <mailto:wferi at niif.hu>> wrote:
>>
>> Ilia Sokolinski <ilia at clearskydata.com <mailto:ilia at clearskydata.com>>
>> writes:
>>
>>> We have a custom Master-Slave resource running on a 3-node pcs
>>> cluster on CentOS 7.1
>>>
>>> As part of what is supposed to be an NDU we do update some properties
>>> of the resource.
>>> For some reason this causes both Master and Slave instances of the
>>>  resource to be restarted.
>>>
>>> Since restart takes a fairly long time for us, the update becomes
>>> very much disruptive.
>>>
>>> Is this expected?
>>
>> Yes, if you changed a parameter declared with unique="1" in your resource
>> agent metadata.
>>
>>> We have not seen this behavior with the previous release of pacemaker.
>>
>> I'm surprised...
>> -- 
>> Feri