[Pacemaker] crm_resource -L not trustable right after restart

Brian J. Murrell (brian) brian at interlinx.bc.ca
Tue Jan 21 18:54:29 EST 2014


On Thu, 2014-01-16 at 14:49 +1100, Andrew Beekhof wrote:
> 
> What crm_mon are you looking at?
> I see stuff like:
> 
>  virt-fencing	(stonith:fence_xvm):	Started rhos4-node3 
>  Resource Group: mysql-group
>      mysql-vip	(ocf::heartbeat:IPaddr2):	Started rhos4-node3 
>      mysql-fs	(ocf::heartbeat:Filesystem):	Started rhos4-node3 
>      mysql-db	(ocf::heartbeat:mysql):	Started rhos4-node3 

Yes, you are right.  I couldn't see the forest for the trees.

I initially was optimistic about crm_mon being more truthful than
crm_resource but it turns out it is not.

Take for example these commands to set a constraint and start a resource
(which has already been defined at this point):

[21/Jan/2014:13:46:40] cibadmin -o constraints -C -X '<rsc_location id="res1-primary" node="node5" rsc="res1" score="20"/>'
[21/Jan/2014:13:46:41] cibadmin -o constraints -C -X '<rsc_location id="res1-secondary" node="node6" rsc="res1" score="10"/>'
[21/Jan/2014:13:46:42] crm_resource -r 'res1' -p target-role -m -v 'Started'

and then these repeated calls to crm_mon -1 on node5:

[21/Jan/2014:13:46:42] crm_mon -1
Last updated: Tue Jan 21 13:46:42 2014
Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
Stack: openais
Current DC: node5 - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured
2 Resources configured


Online: [ node5 node6 ]

 st-fencing	(stonith:fence_product):	Started node5 
 res1	(ocf::product:Target):	Started node6 

[21/Jan/2014:13:46:42] crm_mon -1
Last updated: Tue Jan 21 13:46:42 2014
Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
Stack: openais
Current DC: node5 - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured
2 Resources configured


Online: [ node5 node6 ]

 st-fencing	(stonith:fence_product):	Started node5 
 res1	(ocf::product:Target):	Started node6 

[21/Jan/2014:13:46:49] crm_mon -1 -r
Last updated: Tue Jan 21 13:46:49 2014
Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
Stack: openais
Current DC: node5 - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured
2 Resources configured


Online: [ node5 node6 ]

Full list of resources:

 st-fencing	(stonith:fence_product):	Started node5 
 res1	(ocf::product:Target):	Started node5 

The first two are not correct, showing the resource started on node6
when it was actually started on node5.  Finally, 7 seconds later, it is
reporting correctly.  The logs on node{5,6} bear this out.  The resource
was actually only ever started on node5 and never on node6.

Here's the log for node5:

Jan 21 13:42:00 node5 pacemaker: Starting Pacemaker Cluster Manager
Jan 21 13:42:00 node5 pacemakerd[8684]:   notice: main: Starting Pacemaker 1.1.10-14.el6_5.1 (Build: 368c726):  generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios  corosync-plugin cman
Jan 21 13:42:00 node5 pacemakerd[8684]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 stonith-ng[8691]:   notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Jan 21 13:42:00 node5 cib[8690]:   notice: main: Using new config location: /var/lib/pacemaker/cib
Jan 21 13:42:00 node5 cib[8690]:  warning: retrieveCib: Cluster configuration not found: /var/lib/pacemaker/cib/cib.xml
Jan 21 13:42:00 node5 cib[8690]:  warning: readCibXmlFile: Primary configuration corrupt or unusable, trying backups
Jan 21 13:42:00 node5 cib[8690]:  warning: readCibXmlFile: Continuing with an empty configuration.
Jan 21 13:42:00 node5 attrd[8693]:   notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Jan 21 13:42:00 node5 crmd[8695]:   notice: main: CRM Git Version: 368c726
Jan 21 13:42:00 node5 attrd[8693]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 corosync[8646]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x1cbc3c0 for attrd/0
Jan 21 13:42:00 node5 stonith-ng[8691]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 corosync[8646]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x1cb8040 for stonith-ng/0
Jan 21 13:42:00 node5 attrd[8693]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 stonith-ng[8691]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 attrd[8693]:   notice: main: Starting mainloop...
Jan 21 13:42:00 node5 cib[8690]:   notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Jan 21 13:42:00 node5 cib[8690]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 corosync[8646]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x1cc0740 for cib/0
Jan 21 13:42:00 node5 corosync[8646]:   [pcmk  ] info: pcmk_ipc: Sending membership update 4 to cib
Jan 21 13:42:00 node5 cib[8690]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:42:00 node5 cib[8690]:   notice: crm_update_peer_state: plugin_handle_membership: Node node5[307367434] - state is now member (was (null))
Jan 21 13:42:01 node5 crmd[8695]:   notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Jan 21 13:42:01 node5 crmd[8695]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:42:01 node5 corosync[8646]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x1cc4c00 for crmd/0
Jan 21 13:42:01 node5 corosync[8646]:   [pcmk  ] info: pcmk_ipc: Sending membership update 4 to crmd
Jan 21 13:42:01 node5 crmd[8695]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:42:01 node5 stonith-ng[8691]:   notice: setup_cib: Watching for stonith topology changes
Jan 21 13:42:01 node5 crmd[8695]:   notice: crm_update_peer_state: plugin_handle_membership: Node node5[307367434] - state is now member (was (null))
Jan 21 13:42:01 node5 crmd[8695]:   notice: do_started: The local CRM is operational
Jan 21 13:42:01 node5 crmd[8695]:   notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
Jan 21 13:42:22 node5 crmd[8695]:  warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Jan 21 13:42:22 node5 crmd[8695]:   notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
Jan 21 13:42:22 node5 cib[8690]:   notice: cib_perform_op: Configuration ordering change detected
Jan 21 13:42:22 node5 stonith-ng[8691]:   notice: log_cib_diff: cib_process_diff: Local-only Change: 0.1.1
Jan 21 13:42:22 node5 stonith-ng[8691]:   notice: cib_process_diff: -- <cib num_updates="1"/>
Jan 21 13:42:22 node5 stonith-ng[8691]:   notice: cib_process_diff: ++ <cib epoch="1" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Tue Jan 21 13:42:01 2014" update-origin="node5" update-client="crmd"/>
Jan 21 13:42:22 node5 stonith-ng[8691]:   notice: cib_process_diff: Diff 0.1.1 -> 0.1.1 from local not applied to 0.1.1: + and - versions in the diff did not change
Jan 21 13:42:22 node5 stonith-ng[8691]:   notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
Jan 21 13:42:22 node5 stonith-ng[8691]:   notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
Jan 21 13:42:22 node5 cib[8690]:   notice: cib:diff: Diff: --- 0.1.2
Jan 21 13:42:22 node5 cib[8690]:   notice: cib:diff: Diff: +++ 0.2.1 5ea115fd649f5fa3419b7f1c9abf8992
Jan 21 13:42:22 node5 cib[8690]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="1" num_updates="2"/>
Jan 21 13:42:22 node5 cib[8690]:   notice: cib:diff: ++       <cluster_property_set id="cib-bootstrap-options">
Jan 21 13:42:22 node5 cib[8690]:   notice: cib:diff: ++         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-14.el6_5.1-368c726"/>
Jan 21 13:42:22 node5 cib[8690]:   notice: cib:diff: ++       </cluster_property_set>
Jan 21 13:42:22 node5 stonith-ng[8691]:   notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
Jan 21 13:42:22 node5 cib[8690]:   notice: log_cib_diff: cib:diff: Local-only Change: 0.3.1
Jan 21 13:42:22 node5 cib[8690]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="2" num_updates="1"/>
Jan 21 13:42:22 node5 cib[8690]:   notice: cib:diff: ++         <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/>
Jan 21 13:42:22 node5 cib[8690]:   notice: log_cib_diff: cib:diff: Local-only Change: 0.4.1
Jan 21 13:42:22 node5 cib[8690]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="3" num_updates="1"/>
Jan 21 13:42:22 node5 cib[8690]:   notice: cib:diff: ++         <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
Jan 21 13:42:22 node5 stonith-ng[8691]:   notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed, requesting a full refresh (-207)
Jan 21 13:42:22 node5 attrd[8693]:   notice: attrd_local_callback: Sending full refresh (origin=crmd)
Jan 21 13:42:22 node5 pengine[8694]:  warning: cluster_status: We do not have quorum - fencing and resource management disabled
Jan 21 13:42:22 node5 pengine[8694]:    error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
Jan 21 13:42:22 node5 pengine[8694]:    error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
Jan 21 13:42:22 node5 pengine[8694]:    error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Jan 21 13:42:22 node5 pengine[8694]:   notice: stage6: Delaying fencing operations until there are resources to manage
Jan 21 13:42:22 node5 pengine[8694]:   notice: process_pe_message: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-71.bz2
Jan 21 13:42:22 node5 pengine[8694]:   notice: process_pe_message: Configuration ERRORs found during PE processing.  Please run "crm_verify -L" to identify issues.
Jan 21 13:42:22 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 2: probe_complete probe_complete on node5 (local) - no waiting
Jan 21 13:42:22 node5 crmd[8695]:   notice: run_graph: Transition 0 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-71.bz2): Complete
Jan 21 13:42:22 node5 crmd[8695]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 21 13:42:22 node5 attrd[8693]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Jan 21 13:42:22 node5 stonith-ng[8691]:   notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
Jan 21 13:42:22 node5 attrd[8693]:   notice: attrd_perform_update: Sent update 4: probe_complete=true
Jan 21 13:42:36 node5 cibadmin[8745]:   notice: crm_log_args: Invoked: cibadmin --delete --obj_type resources -X <primitive class="stonith" id="st-fencing" type="fence_product"/> 
Jan 21 13:42:36 node5 cibadmin[8746]:   notice: crm_log_args: Invoked: cibadmin --create --obj_type resources -X <primitive class="stonith" id="st-fencing" type="fence_product"/> 
Jan 21 13:42:36 node5 crmd[8695]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 21 13:42:36 node5 cib[8690]:   notice: cib:diff: Diff: --- 0.4.4
Jan 21 13:42:36 node5 cib[8690]:   notice: cib:diff: Diff: +++ 0.5.1 e1637d693af73b0a1ff6beeabaa16dbe
Jan 21 13:42:36 node5 cib[8690]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="4" num_updates="4"/>
Jan 21 13:42:36 node5 cib[8690]:   notice: cib:diff: ++       <primitive class="stonith" id="st-fencing" type="fence_product"/>
Jan 21 13:42:36 node5 pengine[8694]:  warning: cluster_status: We do not have quorum - fencing and resource management disabled
Jan 21 13:42:36 node5 pengine[8694]:   notice: LogActions: Start   st-fencing#011(node5 - blocked)
Jan 21 13:42:36 node5 pengine[8694]:   notice: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-72.bz2
Jan 21 13:42:36 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 4: monitor st-fencing_monitor_0 on node5 (local)
Jan 21 13:42:37 node5 cibadmin[8748]:   notice: crm_log_args: Invoked: cibadmin --modify --allow-create -o crm_config -X <cluster_property_set id="cib-bootstrap-options">#012<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>#012<nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/>#012<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>#012<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"
Jan 21 13:42:37 node5 cib[8690]:   notice: log_cib_diff: cib:diff: Local-only Change: 0.6.1
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: --         <nvpair value="classic openais (with plugin)" id="cib-bootstrap-options-cluster-infrastructure"/>
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: ++         <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: ++         <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: ++         <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/>
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: ++         <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/>
Jan 21 13:42:37 node5 cib[8690]:   notice: log_cib_diff: cib:diff: Local-only Change: 0.7.1
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="6" num_updates="1"/>
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: ++     <rsc_defaults>
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: ++       <meta_attributes id="rsc_defaults-options">
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: ++         <nvpair id="rsc_defaults-options-resource-stickiness" name="resource-stickiness" value="1000"/>
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: ++       </meta_attributes>
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: ++     </rsc_defaults>
Jan 21 13:42:37 node5 cib[8690]:   notice: log_cib_diff: cib:diff: Local-only Change: 0.8.1
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="7" num_updates="1"/>
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: ++         <nvpair id="rsc_defaults-options-failure-timeout" name="failure-timeout" value="20m"/>
Jan 21 13:42:37 node5 cib[8690]:   notice: log_cib_diff: cib:diff: Local-only Change: 0.9.1
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="8" num_updates="1"/>
Jan 21 13:42:37 node5 cib[8690]:   notice: cib:diff: ++         <nvpair id="rsc_defaults-options-migration-threshold" name="migration-threshold" value="3"/>
Jan 21 13:42:37 node5 stonith-ng[8691]:   notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
Jan 21 13:42:39 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on node5 (local) - no waiting
Jan 21 13:42:39 node5 crmd[8695]:   notice: run_graph: Transition 1 (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-72.bz2): Stopped
Jan 21 13:42:39 node5 pengine[8694]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:42:39 node5 pengine[8694]:   notice: LogActions: Start   st-fencing#011(node5)
Jan 21 13:42:39 node5 pengine[8694]:   notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-73.bz2
Jan 21 13:42:39 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 4: start st-fencing_start_0 on node5 (local)
Jan 21 13:42:40 node5 stonith-ng[8691]:   notice: stonith_device_register: Device 'st-fencing' already existed in device list (1 active devices)
Jan 21 13:42:40 node5 crmd[8695]:   notice: process_lrm_event: LRM operation st-fencing_start_0 (call=8, rc=0, cib-update=42, confirmed=true) ok
Jan 21 13:42:40 node5 crmd[8695]:   notice: run_graph: Transition 2 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-73.bz2): Complete
Jan 21 13:42:40 node5 crmd[8695]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 21 13:45:02 node5 ntpd[8470]: 0.0.0.0 c614 04 freq_mode
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 8: memb=1, new=0, lost=0
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] info: pcmk_peer_update: memb: node5 307367434
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 8: memb=2, new=1, lost=0
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] info: update_member: Creating entry for node 290590218 born on 8
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] info: update_member: Node 290590218/unknown is now: member
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] info: pcmk_peer_update: NEW:  .pending. 290590218
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] info: pcmk_peer_update: MEMB: .pending. 290590218
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] info: pcmk_peer_update: MEMB: node5 307367434
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] info: send_member_notification: Sending membership update 8 to 2 children
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] info: update_member: 0x1ca5c80 Node 307367434 ((null)) born on: 4
Jan 21 13:46:23 node5 corosync[8646]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan 21 13:46:23 node5 cib[8690]:   notice: crm_update_peer_state: plugin_handle_membership: Node (null)[290590218] - state is now member (was (null))
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] info: update_member: 0x1cc9e50 Node 290590218 (node6) born on: 8
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] info: update_member: 0x1cc9e50 Node 290590218 now known as node6 (was: (null))
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] info: update_member: Node node6 now has 1 quorum votes (was 0)
Jan 21 13:46:23 node5 corosync[8646]:   [pcmk  ] info: send_member_notification: Sending membership update 8 to 2 children
Jan 21 13:46:23 node5 crmd[8695]:   notice: crm_update_peer_state: plugin_handle_membership: Node (null)[290590218] - state is now member (was (null))
Jan 21 13:46:23 node5 cib[8690]:   notice: plugin_handle_membership: Membership 8: quorum acquired
Jan 21 13:46:23 node5 corosync[8646]:   [CPG   ] chosen downlist: sender r(0) ip(10.14.82.17) r(1) ip(10.128.2.17) ; members(old:1 left:0)
Jan 21 13:46:23 node5 corosync[8646]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jan 21 13:46:23 node5 crmd[8695]:   notice: plugin_handle_membership: Membership 8: quorum acquired
Jan 21 13:46:23 node5 cib[8690]:   notice: cib:diff: Diff: --- 0.9.3
Jan 21 13:46:23 node5 cib[8690]:   notice: cib:diff: Diff: +++ 0.10.1 7be5e06f43347759896f381ff0e09476
Jan 21 13:46:23 node5 cib[8690]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="9" num_updates="3"/>
Jan 21 13:46:23 node5 cib[8690]:   notice: cib:diff: ++       <node id="node6" uname="node6"/>
Jan 21 13:46:23 node5 crmd[8695]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 21 13:46:30 node5 attrd[8693]:   notice: attrd_local_callback: Sending full refresh (origin=crmd)
Jan 21 13:46:30 node5 attrd[8693]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Jan 21 13:46:31 node5 pengine[8694]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:31 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 5: monitor st-fencing_monitor_0 on node6
Jan 21 13:46:31 node5 pengine[8694]:   notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-74.bz2
Jan 21 13:46:32 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 4: probe_complete probe_complete on node6 - no waiting
Jan 21 13:46:32 node5 crmd[8695]:   notice: run_graph: Transition 3 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-74.bz2): Complete
Jan 21 13:46:32 node5 crmd[8695]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan 21 13:46:40 node5 cibadmin[9025]:   notice: crm_log_args: Invoked: cibadmin -o resources -C -x /tmp/tmp80fPna 
Jan 21 13:46:40 node5 stonith-ng[8691]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:40 node5 crmd[8695]:   notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: Diff: --- 0.10.10
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: Diff: +++ 0.11.1 50c098c8e9b5f371c281ae77c05120e2
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="10" num_updates="10"/>
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++       <primitive class="ocf" provider="product" type="Target" id="res1">
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++         <meta_attributes id="res1-meta_attributes">
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++           <nvpair name="target-role" id="res1-meta_attributes-target-role" value="Stopped"/>
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++         </meta_attributes>
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++         <operations id="res1-operations">
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++           <op id="res1-monitor-5" interval="5" name="monitor" timeout="60"/>
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++           <op id="res1-start-0" interval="0" name="start" timeout="300"/>
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++           <op id="res1-stop-0" interval="0" name="stop" timeout="300"/>
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++         </operations>
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++         <instance_attributes id="res1-instance_attributes">
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++           <nvpair id="res1-instance_attributes-target" name="target" value="63e7a581-08fd-4a6c-bcb8-153469bd36a2"/>
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++         </instance_attributes>
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++       </primitive>
Jan 21 13:46:40 node5 pengine[8694]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:40 node5 pengine[8694]:   notice: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-75.bz2
Jan 21 13:46:40 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 6: monitor res1_monitor_0 on node6
Jan 21 13:46:40 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 4: monitor res1_monitor_0 on node5 (local)
Jan 21 13:46:40 node5 cibadmin[9028]:   notice: crm_log_args: Invoked: cibadmin -o constraints -C -X <rsc_location id="res1-primary" node="node5" rsc="res1" score="20"/> 
Jan 21 13:46:40 node5 cib[8690]:   notice: log_cib_diff: cib:diff: Local-only Change: 0.12.1
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="11" num_updates="1"/>
Jan 21 13:46:40 node5 cib[8690]:   notice: cib:diff: ++       <rsc_location id="res1-primary" node="node5" rsc="res1" score="20"/>
Jan 21 13:46:41 node5 stonith-ng[8691]:   notice: stonith_device_register: Device 'st-fencing' already existed in device list (1 active devices)
Jan 21 13:46:41 node5 stonith-ng[8691]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:41 node5 kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. quota=on. Opts: 
Jan 21 13:46:41 node5 crmd[8695]:  warning: status_from_rc: Action 6 (res1_monitor_0) on node6 failed (target: 7 vs. rc: 0): Error
Jan 21 13:46:41 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 5: probe_complete probe_complete on node6 - no waiting
Jan 21 13:46:41 node5 stonith-ng[8691]:   notice: stonith_device_register: Device 'st-fencing' already existed in device list (1 active devices)
Jan 21 13:46:41 node5 stonith-ng[8691]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:41 node5 cib[8690]:   notice: cib:diff: Diff: --- 0.12.2
Jan 21 13:46:41 node5 cib[8690]:   notice: cib:diff: Diff: +++ 0.13.1 48a53642acc94d5f541c3297baf08880
Jan 21 13:46:41 node5 cib[8690]:   notice: cib:diff: -- <cib admin_epoch="0" epoch="12" num_updates="2"/>
Jan 21 13:46:41 node5 cib[8690]:   notice: cib:diff: ++       <rsc_location id="res1-secondary" node="node6" rsc="res1" score="10"/>
Jan 21 13:46:41 node5 stonith-ng[8691]:   notice: stonith_device_register: Device 'st-fencing' already existed in device list (1 active devices)
Jan 21 13:46:41 node5 crmd[8695]:   notice: process_lrm_event: LRM operation res1_monitor_0 (call=13, rc=7, cib-update=68, confirmed=true) not running
Jan 21 13:46:41 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on node5 (local) - no waiting
Jan 21 13:46:41 node5 crmd[8695]:   notice: run_graph: Transition 4 (Complete=4, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-75.bz2): Stopped
Jan 21 13:46:41 node5 pengine[8694]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:41 node5 pengine[8694]:   notice: LogActions: Stop    res1#011(node6)
Jan 21 13:46:41 node5 pengine[8694]:   notice: process_pe_message: Calculated Transition 5: /var/lib/pacemaker/pengine/pe-input-76.bz2
Jan 21 13:46:41 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 7: stop res1_stop_0 on node6
Jan 21 13:46:42 node5 stonith-ng[8691]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:42 node5 cib[8690]:   notice: cib:diff: Diff: --- 0.13.2
Jan 21 13:46:42 node5 cib[8690]:   notice: cib:diff: Diff: +++ 0.14.1 36575d506c0ed5769e4655c88ef3510d
Jan 21 13:46:42 node5 cib[8690]:   notice: cib:diff: --           <nvpair value="Stopped" id="res1-meta_attributes-target-role"/>
Jan 21 13:46:42 node5 cib[8690]:   notice: cib:diff: ++           <nvpair name="target-role" id="res1-meta_attributes-target-role" value="Started"/>
Jan 21 13:46:42 node5 crmd[8695]:   notice: run_graph: Transition 5 (Complete=1, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-76.bz2): Stopped
Jan 21 13:46:42 node5 pengine[8694]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:42 node5 pengine[8694]:   notice: LogActions: Start   res1#011(node5)
Jan 21 13:46:42 node5 pengine[8694]:   notice: process_pe_message: Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-77.bz2
Jan 21 13:46:42 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 7: start res1_start_0 on node5 (local)
Jan 21 13:46:42 node5 stonith-ng[8691]:   notice: stonith_device_register: Device 'st-fencing' already existed in device list (1 active devices)
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [ [ ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [   { ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [     "args": [ ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [       "mount",  ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [       "-t",  ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [       "ext4",  ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [       "/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk1",  ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [       "/mnt/res1" ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [     ],  ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [     "rc": 0,  ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [     "stderr": "",  ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [     "stdout": "" ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [   } ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [ ] ]
Jan 21 13:46:43 node5 lrmd[8692]:   notice: operation_finished: res1_start_0:9059:stderr [  ]
Jan 21 13:46:43 node5 crmd[8695]:   notice: process_lrm_event: LRM operation res1_start_0 (call=16, rc=0, cib-update=71, confirmed=true) ok
Jan 21 13:46:43 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 8: monitor res1_monitor_5000 on node5 (local)
Jan 21 13:46:43 node5 crmd[8695]:   notice: process_lrm_event: LRM operation res1_monitor_5000 (call=19, rc=0, cib-update=72, confirmed=false) ok
Jan 21 13:46:43 node5 crmd[8695]:   notice: run_graph: Transition 6 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-77.bz2): Complete
Jan 21 13:46:43 node5 crmd[8695]:   notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]

and here's the log for node6:

Jan 21 13:46:26 node6 pacemaker: Starting Pacemaker Cluster Manager
Jan 21 13:46:26 node6 pacemakerd[9322]:   notice: main: Starting Pacemaker 1.1.10-14.el6_5.1 (Build: 368c726):  generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios  corosync-plugin cman
Jan 21 13:46:26 node6 pacemakerd[9322]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2)
Jan 21 13:46:26 node6 stonith-ng[9329]:   notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Jan 21 13:46:26 node6 attrd[9331]:   notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Jan 21 13:46:26 node6 crmd[9333]:   notice: main: CRM Git Version: 368c726
Jan 21 13:46:26 node6 cib[9328]:   notice: main: Using new config location: /var/lib/pacemaker/cib
Jan 21 13:46:26 node6 cib[9328]:  warning: retrieveCib: Cluster configuration not found: /var/lib/pacemaker/cib/cib.xml
Jan 21 13:46:26 node6 cib[9328]:  warning: readCibXmlFile: Primary configuration corrupt or unusable, trying backups
Jan 21 13:46:26 node6 cib[9328]:  warning: readCibXmlFile: Continuing with an empty configuration.
Jan 21 13:46:26 node6 attrd[9331]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x1b1fe50 for attrd/0
Jan 21 13:46:26 node6 stonith-ng[9329]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x1b1bad0 for stonith-ng/0
Jan 21 13:46:26 node6 attrd[9331]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 stonith-ng[9329]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 attrd[9331]:   notice: main: Starting mainloop...
Jan 21 13:46:26 node6 cib[9328]:   notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Jan 21 13:46:26 node6 cib[9328]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x1b241d0 for cib/0
Jan 21 13:46:26 node6 corosync[9288]:   [pcmk  ] info: pcmk_ipc: Sending membership update 8 to cib
Jan 21 13:46:26 node6 cib[9328]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:46:26 node6 cib[9328]:   notice: plugin_handle_membership: Membership 8: quorum acquired
Jan 21 13:46:26 node6 cib[9328]:   notice: crm_update_peer_state: plugin_handle_membership: Node node6[290590218] - state is now member (was (null))
Jan 21 13:46:26 node6 cib[9328]:   notice: crm_update_peer_state: plugin_handle_membership: Node node5[307367434] - state is now member (was (null))
Jan 21 13:46:27 node6 crmd[9333]:   notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin)
Jan 21 13:46:27 node6 crmd[9333]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:46:27 node6 corosync[9288]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x1b28550 for crmd/0
Jan 21 13:46:27 node6 corosync[9288]:   [pcmk  ] info: pcmk_ipc: Sending membership update 8 to crmd
Jan 21 13:46:27 node6 crmd[9333]:   notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Jan 21 13:46:27 node6 crmd[9333]:   notice: plugin_handle_membership: Membership 8: quorum acquired
Jan 21 13:46:27 node6 crmd[9333]:   notice: crm_update_peer_state: plugin_handle_membership: Node node6[290590218] - state is now member (was (null))
Jan 21 13:46:27 node6 crmd[9333]:   notice: crm_update_peer_state: plugin_handle_membership: Node node5[307367434] - state is now member (was (null))
Jan 21 13:46:27 node6 stonith-ng[9329]:   notice: setup_cib: Watching for stonith topology changes
Jan 21 13:46:27 node6 crmd[9333]:   notice: do_started: The local CRM is operational
Jan 21 13:46:27 node6 crmd[9333]:   notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
Jan 21 13:46:29 node6 stonith-ng[9329]:  warning: cib_process_diff: Diff 0.1.1 -> 0.10.4 from local not applied to 0.1.1: Failed application of an update diff
Jan 21 13:46:29 node6 stonith-ng[9329]:   notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
Jan 21 13:46:29 node6 crmd[9333]:   notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Jan 21 13:46:29 node6 attrd[9331]:   notice: attrd_local_callback: Sending full refresh (origin=crmd)
Jan 21 13:46:29 node6 stonith-ng[9329]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:30 node6 stonith-ng[9329]:   notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
Jan 21 13:46:32 node6 attrd[9331]:   notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
Jan 21 13:46:32 node6 attrd[9331]:   notice: attrd_perform_update: Sent update 7: probe_complete=true
Jan 21 13:46:40 node6 stonith-ng[9329]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:41 node6 crmd[9333]:   notice: process_lrm_event: LRM operation res1_monitor_0 (call=10, rc=0, cib-update=10, confirmed=true) ok
Jan 21 13:46:41 node6 cibadmin[9408]:   notice: crm_log_args: Invoked: cibadmin -o constraints -C -X <rsc_location id="res1-secondary" node="node6" rsc="res1" score="10"/> 
Jan 21 13:46:41 node6 stonith-ng[9329]:   notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
Jan 21 13:46:41 node6 stonith-ng[9329]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:42 node6 crmd[9333]:   notice: process_lrm_event: LRM operation res1_stop_0 (call=13, rc=0, cib-update=11, confirmed=true) ok
Jan 21 13:46:42 node6 stonith-ng[9329]:   notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
Jan 21 13:46:42 node6 stonith-ng[9329]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:43 node6 stonith-ng[9329]:   notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)
Jan 21 13:46:43 node6 stonith-ng[9329]:   notice: unpack_config: On loss of CCM Quorum: Ignore
Jan 21 13:46:44 node6 stonith-ng[9329]:   notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices)

Thoughts?

b.







More information about the Pacemaker mailing list