[Pacemaker] election trigger

Thu Oct 30 09:49:04 EDT 2008

On Thursday 30 October 2008 12:45:58 Bernd Schubert wrote:
> Hello,
>
> earlier this year complained on the heartbeat mailing list about huge
> startup times, when deadtime is large (due to initdead >= deadtime):
>
> http://www.mail-archive.com/linux-ha%40lists.linux-ha.org/msg07801.html
>
> Finally I found the time to look more detailed into this issue. It is
> rather easy to convince heartbeat it is to go online, basically just a
> removal in check_comm_isup() of this condition:
>
> if (config->rtjoinconfig != HB_JOIN_NONE
> && !init_deadtime_passed){
> 	return;
> }

I was wrong here, we with a fixed node configuration already have HB_JOIN_NONE 
set. So the only culprit is crm / pacemaker.

>
> But then the trouble is with crm, it still refuses to select any of the
> nodes as domain controller and so nothing will go online after a system
> wide heartbeat shutdown. The reason is quite simple, crm uses a simple
> timer to the initial selection. As timeout it then uses getenv(ENV_PREFIX
> "initdead") set by heartbeat. See the setting and usage of
> election_trigger->period_ms in do_startup(), config_query_callback and
> config_query_callback().
>
> IMHO using such a simple timer is plain wrong. Actually heartbeat should
> tell crm when all cluster nodes have been found and then immediately the DC
> should be selected.
> Well, actually we could keep the timer, but additionally
> also would need to get informed by heartbeat when all cluster nodes are
> already online. Then the timer could be stopped and the DC selection could
> be done immediately. Is there already a callback from heartbeat when all
> nodes are onlined?
>
>
> Thanks,
> Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH