[ClusterLabs] pcmk_remote evaluation (continued)

Vladislav Bogdanov bubble at hoster-ok.com
Mon Dec 11 15:43:18 EST 2017


11.12.2017 23:06, Ken Gaillot wrote:
[...]
>> =====
>>
>> * The first issue I found (and I expect that to be a reason for some
>> other issues) is that
>> pacemaker_remote does not drop an old crmds' connection after new
>> crmd connects.
>> As IPC proxy connections are in the hash table, there is a 50% chance
>> that remoted tries to
>> reach an old crmd to f.e. proxy checks of node attributes when
>> resources are reprobed.
>> That leads to timeouts of that resources' probes with consequent
>> reaction from a cluster.
>> A solution here could be to drop old IPC proxy connection as soon as
>> new one is established.
> 
> We can't drop connections from the pacemaker_remoted side because it
> doesn't know anything about the cluster state (e.g. whether the cluster
> connection resource is live-migrating).

Well, ok. But what happens when the fenced cluster node goes back and 
receives a TCP packet from the old connection? Yes, it sends RST which 
would terminate a connection on the peer side and then pcmk_remoted 
should shutdown it on a socket event.

> 
> However we can simply always use the most recently connected provider,
> which I think solves the issue. See commit e9a7e3bb, one of a few
> recent bugfixes in the master branch for pacemaker_remoted. It will
> most likely not make it into 2.0 (which I'm trying to focus on
> deprecated syntax removals), but the next release after that.

Will definitely try it, all stakeholders are already notified that we 
need another round on all available hardware :) We will test as soon as 
it becomes free.

I will return to this as soon as I have some results.

Thank you,
Vladislav




More information about the Users mailing list