[Pacemaker] Need help with resolving very long election cycle

Thu Feb 2 17:34:40 EST 2012

On Fri, Feb 3, 2012 at 9:31 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Thu, Feb 2, 2012 at 9:55 PM, Shyam <shyam.kaushik at gmail.com> wrote:
>> Hi Andrew,
>>
>> Here is more logs covering a larger period that shows multiple of this
>> election cycle. Please note that in the below case I had set dc-deadtime to
>> 5secs & the I_DC_TIMEOUT pops up every 5 secs. I turned this dc-deadtime to
>> 10secs & the long election cycle problem disappeared. It no longer happens.
>> I suspect that before a single election cycle completes, the next
>> I_DC_TIMEOUT kicks-in. Could this be the reason?
>
> Yes.  The question is why the cycle is taking so long :-/

Could you reproduce with debug on please?
It would be nice to know what the cluster is doing for the 4 seconds
between these two messages:

Jan 17 12:00:04 vsa-0000003ca-vc-0 crmd: [1120]: WARN:
start_subsystem: Client pengine already running as pid 4243
Jan 17 12:00:08 vsa-0000003ca-vc-0 crmd: [1120]: info: do_dc_takeover:
Taking over DC status for this partition

What version of pacemaker is this btw?