[ClusterLabs] pcmk_remote evaluation (continued)

Ken Gaillot kgaillot at redhat.com
Mon Dec 11 22:11:33 EST 2017


On Mon, 2017-12-11 at 23:43 +0300, Vladislav Bogdanov wrote:
> 11.12.2017 23:06, Ken Gaillot wrote:
> [...]
> > > =====
> > > 
> > > * The first issue I found (and I expect that to be a reason for
> > > some
> > > other issues) is that
> > > pacemaker_remote does not drop an old crmds' connection after new
> > > crmd connects.
> > > As IPC proxy connections are in the hash table, there is a 50%
> > > chance
> > > that remoted tries to
> > > reach an old crmd to f.e. proxy checks of node attributes when
> > > resources are reprobed.
> > > That leads to timeouts of that resources' probes with consequent
> > > reaction from a cluster.
> > > A solution here could be to drop old IPC proxy connection as soon
> > > as
> > > new one is established.
> > 
> > We can't drop connections from the pacemaker_remoted side because
> > it
> > doesn't know anything about the cluster state (e.g. whether the
> > cluster
> > connection resource is live-migrating).
> 
> Well, ok. But what happens when the fenced cluster node goes back
> and 
> receives a TCP packet from the old connection? Yes, it sends RST
> which 
> would terminate a connection on the peer side and then pcmk_remoted 
> should shutdown it on a socket event.
> 
> > 
> > However we can simply always use the most recently connected
> > provider,
> > which I think solves the issue. See commit e9a7e3bb, one of a few
> > recent bugfixes in the master branch for pacemaker_remoted. It will
> > most likely not make it into 2.0 (which I'm trying to focus on
> > deprecated syntax removals), but the next release after that.
> 
> Will definitely try it, all stakeholders are already notified that
> we 
> need another round on all available hardware :) We will test as soon
> as 
> it becomes free.
> 
> I will return to this as soon as I have some results.
> 
> Thank you,
> Vladislav

Great, thanks. See my latest post about 1.1.18 -- you can compile the
latest 1.1 branch to have this and other known bug fixes.
-- 
Ken Gaillot <kgaillot at redhat.com>




More information about the Users mailing list