[Pacemaker] TOTEM: Process pause detected? Leading to STONITH...

Steven Dake sdake at redhat.com
Mon Aug 15 13:30:06 EDT 2011


On 08/12/2011 03:19 AM, Vladislav Bogdanov wrote:
> ...
>>> I would really like someone that has these process pause problems to
>>> test a patch I have posted to see if it rectifies the situation.  Our
>>> significant QE team at Red Hat doesn't see these problems and I can't
>>> generate them in engineering.  It is possible your device drivers are
>>> taking spinlocks for extended periods or some other kernel problem is
>>> occurring.
>>>
>>> If you feel up to the task of building your own corosync, try out this
>>> patch:
>>>
>>> http://marc.info/?l=openais&m=130989380207300&w=2
> 

Vladislav,

> I do not see any corosync pauses after applied it (right after it have
> been posted). Although I had vacations for two weeks, all other time I
> test cluster under really high CPU load (frankly speaking I lowered it a
> lot because of optimizations) and did not catch any pause (yet). One
> more thing I did is updated igb driver and returned its buffers to
> original 256 (bearing in mind that I originally had pause problem after
> I increased that buffers to 4096). Do not know if it has influence.
> 

Thanks for the feedback (I did read your original response on this).
Unfortunately it is difficult to tell if the other changes you made
fixed the problem, or the patch fixes the problem.

Regards
-steve

>> I'd love to test this, but it'll take a few weeks. 
>> The machines are already productive and we don't have comparable test machines.
>> I'm currently (acutally ;) having a few days off, and when I'm back at the office, 
>> I'll update the Corosync version to v1.4.1 (because of the retransmit list 
>> problem) -- does the patch cleanly apply to v1.4.1?
> 
> yes
> 
> Best,
> Vladislav
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker at oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





More information about the Pacemaker mailing list