[Pacemaker] pre_notify_demote is issued twice
Keisuke MORI
keisuke.mori+ha at gmail.com
Thu Feb 6 08:45:49 UTC 2014
Hi,
I observed that pre_notify_demote is issued twice when a master
resource is migrating.
I'm wondering if this is the correct behavior.
Steps to reproduce:
- Start up 2 nodes cluster configured for the PostgreSQL streaming
replication using pgsql RA as a master/slave resource.
- kill the postgresql process on the master node to induce a fail-over.
- The fail-over succeeds as expected, but pre_notify_demote was
executed twice on each node before demoting on the master resource.
100% reproducible on my cluster.
Pacemaker version: 1.1.11-rc4 (source build from the repo)
OS: RHEL6.4
I have never seen this on Pacemaker-1.0.* cluster with the same configuration.
The relevant logs and pe-inputs are attached.
Diagnostics:
(1) The first transition caused by the process failure (pe-input-160)
initiates pre_notify_demote on both nodes and cancelling slave monitor
on the slave node.
{{{
171 Jan 30 16:08:59 rhel64-1 crmd[8143]: notice: te_rsc_command:
Initiating action 9: cancel prmPostgresql_cancel_10000 on rhel64-2
172 Jan 30 16:08:59 rhel64-1 crmd[8143]: notice: te_rsc_command:
Initiating action 79: notify prmPostgresql_pre_notify_demote_0 on
rhel64-1 (local)
175 Jan 30 16:08:59 rhel64-1 crmd[8143]: notice: te_rsc_command:
Initiating action 81: notify prmPostgresql_pre_notify_demote_0 on
rhel64-2
}}}
(2) When cancelling slave monitor completes, the transition is aborted
by Resource op removal.
{{{
176 Jan 30 16:08:59 rhel64-1 crmd[8143]: info: match_graph_event:
Action prmPostgresql_monitor_10000 (9) confirmed on rhel64-2 (rc=0)
177 Jan 30 16:08:59 rhel64-1 cib[8138]: info: cib_process_request:
Completed cib_delete operation for section status: OK (rc=0,
origin=rhel64-2/crmd/21, version=0.37.9)
178 Jan 30 16:08:59 rhel64-1 crmd[8143]: info:
abort_transition_graph: te_update_diff:258 - Triggered transition
abort (complete=0, node=rhel64-2, tag=lrm_rsc_op,
id=prmPostgresql_monitor_10000,
magic=0:0;26:12:0:acf9a2a3-307c-460b-b786-fc20e6b8aad5, cib=0.37.9) :
Resource op removal
}}}
(3) The second transition is calculated by the abort (pe-input-161)
which results initiating pre_notify_demote again.
{{{
227 Jan 30 16:09:01 rhel64-1 pengine[8142]: notice:
process_pe_message: Calculated Transition 15:
/var/lib/pacemaker/pengine/pe-input-161.bz2
229 Jan 30 16:09:01 rhel64-1 crmd[8143]: notice: te_rsc_command:
Initiating action 78: notify prmPostgresql_pre_notify_demote_0 on
rhel64-1 (local)
232 Jan 30 16:09:01 rhel64-1 crmd[8143]: notice: te_rsc_command:
Initiating action 80: notify prmPostgresql_pre_notify_demote_0 on
rhel64-2
}}}
I think that the transition abort at (2) should not happen.
Regards,
--
Keisuke MORI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs-pre-notify-20140206.tar.bz2
Type: application/x-bzip2
Size: 45972 bytes
Desc: not available
URL: <https://lists.clusterlabs.org/pipermail/pacemaker/attachments/20140206/63abe1d1/attachment-0003.bz2>
More information about the Pacemaker
mailing list