[Pacemaker] election trigger

Bernd Schubert bs at q-leap.de
Thu Oct 30 07:45:58 EDT 2008


Hello,

earlier this year complained on the heartbeat mailing list about huge startup 
times, when deadtime is large (due to initdead >= deadtime):

http://www.mail-archive.com/linux-ha%40lists.linux-ha.org/msg07801.html

Finally I found the time to look more detailed into this issue. It is 
rather easy to convince heartbeat it is to go online, basically just a removal
in check_comm_isup() of this condition:

if (config->rtjoinconfig != HB_JOIN_NONE 
&& !init_deadtime_passed){
	return;
}

But then the trouble is with crm, it still refuses to select any of the nodes
as domain controller and so nothing will go online after a system wide heartbeat
shutdown. The reason is quite simple, crm uses a simple timer to the initial 
selection. As timeout it then uses getenv(ENV_PREFIX "initdead") set by 
heartbeat. See the setting and usage of election_trigger->period_ms 
in do_startup(), config_query_callback and config_query_callback().

IMHO using such a simple timer is plain wrong. Actually heartbeat should
tell crm when all cluster nodes have been found and then immediately the DC 
should be selected. 
Well, actually we could keep the timer, but additionally 
also would need to get informed by heartbeat when all cluster nodes are 
already online. Then the timer could be stopped and the DC selection could
be done immediately. Is there already a callback from heartbeat when all 
nodes are onlined?


Thanks,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH




More information about the Pacemaker mailing list