[Pacemaker] Problems with corosync while forking processes during node startup.

Lars Marowsky-Bree lmb at suse.com
Thu Feb 28 13:23:00 EST 2013


On 2013-02-25T11:42:40, Andrew Beekhof <andrew at beekhof.net> wrote:

> > Or we fix the corosync problem with forking from a multi-threaded
> > program.  ;-)
> That was essentially my point, Steve and I have already tried - for
> quite a long time too.
> I know some people think I just like changing things for the fun of
> it, but this is actually not true.

Hence the ";-)".

(I have opinions on threads in C.)

> > We've never really had customer report problems with this
> > either, but I'm not sure why that is, honestly. I know the problem
> > theoretically exists, but it has never hit us.
> I also never hit this on openSUSE based distros either, or if I did it
> was extremely rare.
> But on Fedora it was so regular as to make the cluster unusable.
> 
> I don't know what makes one of them so special.  Maybe its just some
> compile flags.

I think this may be the crash hidden by setting "timestamp: off" in
corosync.conf. If that's turned on, corosync doesn't like me much at
all.

Don't get me wrong. I'd love to migrate forward and drop the plugin code
if there was an on-wire compatible way of doing so. I can't authorize
breaking rolling upgrades.

(Supporting both, and switching from one to the other live when the last
node has both online, was an option I briefly considered. But then I
woke up screaming at night and decided it perhaps wasn't a good idea.)


Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde





More information about the Pacemaker mailing list