[Pacemaker] concurrent uses of cibadmin: Signon to CIB failed: connection failed

Lars Ellenberg lars.ellenberg at linbit.com
Mon Oct 3 15:29:46 EDT 2011


On Thu, Sep 29, 2011 at 03:45:32PM -0400, Brian J. Murrell wrote:
> So, in another thread there was a discussion of using cibadmin to
> mitigate possible concurrency issue of crm shell.  I have written a test
> program to test that theory and unfortunately cibadmin falls down in the
> face of heavy concurrency also with errors such as:
> 
> Signon to CIB failed: connection failed
> Init failed, could not perform requested operations
> Signon to CIB failed: connection failed
> Init failed, could not perform requested operations
> Signon to CIB failed: connection failed
> Init failed, could not perform requested operations

Cib does a "listen(sock_fd, 10)",
implicitly, via glue, clplumbing ipcsocket.c, socket_wait_conn_new()

You get a connection request backlog of 10.  Usually that is enough to
give a server enough time to accept them "in time".
If you concurrently create many new client sessions,
some client connect() may fail.

Those would then need to be retried.

My feeling is, any retry logic for concurrency issues should go in some
shell wrapper, though. If you really expect to run into too many
connect attempts to cib at the same time regularly,
"You are doing it wrong" ;-)

cibadmin seems to have consistent error codes,
this particular problem should fall into exit code 10.


> Effectively my test runs:
> 
> for x in $(seq 1 50); do
>     cibadmin -o resources -C -x resource-$x.xml &
> done
> 
> My complete test program is attached for review/experimentation if you wish.
> 
> Am I doing something wrong or is this a bug?  I'm using pacemaker
> 1.0.10-1.4.el5 for what it's worth.
> 
> Cheers,
> b.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com




More information about the Pacemaker mailing list