[Pacemaker] Call cib_query failed (-41): Remote node did not respond

Brian J. Murrell brian at interlinx.bc.ca
Tue Jul 3 20:06:31 EDT 2012


On 12-07-03 04:26 PM, David Vossel wrote:
> 
> This is not a definite.  Perhaps you are experiencing this given the pacemaker version you are running

Yes, that is absolutely possible and it certainly has been under
consideration throughout this process.  I did also recognize however,
that I am running the latest stable (1.1.6) release and while I might be
able to experiment with with a development branch in the lab, I could
not use it in production.  So while it would be an interesting
experiment, my primary goal had to be getting 1.1.6 to run stably.

> and the torture test you are running with all those parallel commands,

It is worth keeping in mind that all of those parallel commands are just
as parallel with the 4 node cluster as they are with the 8 (4 nodes
actively modifying the CIB + 4 completely idle nodes) and 16 node
clusters -- both of which failed.

Just because I reduced the number of nodes doesn't mean that I reduced
the parallelism any.  The commands being run on each node are not
serialized and are all launched in parallel on the 4 node cluster as
much as they were with the 16 node cluster.

So strictly speaking, it doesn't seem that parallelism in the CIB
modifications are as much of a factor as simply the number of nodes in
the cluster, even when some (i.e. in the 8 node test I did) of the nodes
are entirely passive and not modifying the CIB at all.

> but I wouldn't go as far as to say pacemaker cannot scale to more than a handful of nodes.

I'd totally welcome being shown the error of my ways.

> I'm sure you know this, I just wanted to be explicit about this so there is no confusion caused by people who may use your example as a concrete metric.

But of course.  In my experiments, it was clear that the cib process
could peak a single core on my 12 core Xeons with just 4 nodes in the
cluster at times.

Therefore it is also clear that some time down the road, assuming CPU is
the limiting factor here, it's quite easy to see how a faster CPU core,
or multithreading the cib would allow for better scaling, but my point
was simply at the current time, and again, assuming (since I don't know
for sure what the limiting factor really is) CPU is the limiting factor
here, somewhere between 4-8 nodes is the limit with more or less default
tunings.

> From the deployments I've seen on the mailing list and bug reports, the most common clusters appear to be around the 2-6 node mark.

Which seems consistent.

> The messaging involved with keeping the all the local resource operations in the CIB synced across that many nodes is pretty insane.

Indeed, and I most certainly had considered that.  What really threw a
curve in that train of thought for me though was that even idle,
non-CIB-modifying nodes (i.e. turning a working 4 node cluster into a
non-working 8 node cluster by adding 4 nodes that do nothing with the
CIB) can tip a working configuration over into non-working.

I could most certainly see how the contention of 8 nodes all trying to
jam stuff into the CIB might be taxing with all of the locking that
needs to go on, etc, but for those 4 added idle nodes to add enough
complexity to make an working 4 node cluster not work is puzzling.
Puzzling enough (granted, to somebody who knows zilch about the
messaging that goes on with CIB operations) to make is smell more like a
bug than simple contention.

> If you are set on using pacemaker,

Well, I am not necessarily married to it.  It did just seem like the
tool with the critical mass behind it.  As sketchy as it might seem to
ask, (and I only am since you seem to be hinting that there might be a
better tool for the job) is there a tool more suited to the job?

> the best approach for scaling for your situation would probably be to try and figure out how to break nodes into smaller clusters that are easier to manage.

Indeed, that is what I ended up doing.  Now my 16 node cluster is 4 4
node clusters.  The problem with that though, is that when a node in a
cluster fails, it has only 3 other nodes to spread it's resources around
onto, and if 2 should fail, 2 nodes are trying to service twice their
normal load.  The benefit of larger clusters is clear. in giving
pacemaker more nodes to evenly distribute resources to, impacting the
load of other the other nodes minimally when one or more nodes of the
cluster do fail.

> I have not heard of a single deployment as large as you are thinking of.

Heh.  Not atypical of me to push the envelope I'm afraid.  :-/

Cheers, and many thanks for your input.  It is valuable to this discussion.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20120703/af1bf63c/attachment-0003.sig>


More information about the Pacemaker mailing list