[Pacemaker] pacemaker 1.1.4 problem in a 4-node cluster

Wed Dec 1 10:30:44 UTC 2010

Hello Andrew, hello all,

I'm setting up new 4-node cluster (which will raise to 16 nodes in
future), stonith enabled, no-quorum-policy=freeze, pacemaker-1.1 schema,
and sometimes after node start crmd loops over following messages (sorry
for line breaks):

Dec  1 10:09:00 v02-b crmd: [1857]: info: crm_timer_popped: Election
Trigger (I_DC_TIMEOUT) just popped!
Dec  1 10:09:00 v02-b crmd: [1857]: WARN: do_log: FSA: Input
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Dec  1 10:09:00 v02-b crmd: [1857]: info: do_state_transition: State
transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT
cause=C_TIMER_POPPED origin=crm_timer_popped ]
Dec  1 10:09:00 v02-b crmd: [1857]: info: do_state_transition: State
transition S_ELECTION -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Dec  1 10:09:00 v02-b crmd: [1857]: info: do_dc_release: DC role released
Dec  1 10:09:00 v02-b crmd: [1857]: info: do_te_control: Transitioner is
now inactive

At the same time this node has full CIB and 'crm' utility works there
(showing parent node as OFFLINE).

It is impossible to shutdown pacemaker gracefully, only kill -9 to all
processes help.

If I start pacemaker again after killing it forcibly, everything goes
smooth and node goes to 'online' and starts resources.

Could it be something trivial to fix? I can supply hb_report if it is not.

Versions are:
corosync-1.2.8
openais-1.1.4
pacemaker-1.1.4 (1.1.4-ac608e3491c7dfc3b3e3c36d966ae9b016f77065)

pacemaker is run as MCP.

Best regards,
Vladislav