[ClusterLabs] pcmk_remote evaluation (continued)
Vladislav Bogdanov
bubble at hoster-ok.com
Mon Dec 11 15:43:18 EST 2017
11.12.2017 23:06, Ken Gaillot wrote:
[...]
>> =====
>>
>> * The first issue I found (and I expect that to be a reason for some
>> other issues) is that
>> pacemaker_remote does not drop an old crmds' connection after new
>> crmd connects.
>> As IPC proxy connections are in the hash table, there is a 50% chance
>> that remoted tries to
>> reach an old crmd to f.e. proxy checks of node attributes when
>> resources are reprobed.
>> That leads to timeouts of that resources' probes with consequent
>> reaction from a cluster.
>> A solution here could be to drop old IPC proxy connection as soon as
>> new one is established.
>
> We can't drop connections from the pacemaker_remoted side because it
> doesn't know anything about the cluster state (e.g. whether the cluster
> connection resource is live-migrating).
Well, ok. But what happens when the fenced cluster node goes back and
receives a TCP packet from the old connection? Yes, it sends RST which
would terminate a connection on the peer side and then pcmk_remoted
should shutdown it on a socket event.
>
> However we can simply always use the most recently connected provider,
> which I think solves the issue. See commit e9a7e3bb, one of a few
> recent bugfixes in the master branch for pacemaker_remoted. It will
> most likely not make it into 2.0 (which I'm trying to focus on
> deprecated syntax removals), but the next release after that.
Will definitely try it, all stakeholders are already notified that we
need another round on all available hardware :) We will test as soon as
it becomes free.
I will return to this as soon as I have some results.
Thank you,
Vladislav
More information about the Users
mailing list