[ClusterLabs] Pengine always trying to start the resource on the standby node.
Andrei Borzenkov
arvidjaar at gmail.com
Tue Jun 5 23:58:48 EDT 2018
06.06.2018 04:27, Albert Weng пишет:
> Hi All,
>
> I have created active/passive pacemaker cluster on RHEL 7.
>
> Here are my environment:
> clustera : 192.168.11.1 (passive)
> clusterb : 192.168.11.2 (master)
> clustera-ilo4 : 192.168.11.10
> clusterb-ilo4 : 192.168.11.11
>
> cluster resource status :
> cluster_fs started on clusterb
> cluster_vip started on clusterb
> cluster_sid started on clusterb
> cluster_listnr started on clusterb
>
> Both cluster node are online status.
>
> i found my corosync.log contain many records like below:
>
> clustera pengine: info: determine_online_status_fencing:
> Node clusterb is active
> clustera pengine: info: determine_online_status: Node
> clusterb is online
> clustera pengine: info: determine_online_status_fencing:
> Node clustera is active
> clustera pengine: info: determine_online_status: Node
> clustera is online
>
> *clustera pengine: warning: unpack_rsc_op_failure: Processing
> failed op start for cluster_sid on clustera: unknown error (1)*
> *=> Question : Why pengine always trying to start cluster_sid on the
> passive node? how to fix it? *
>
pacemaker does not have concept of "passive" or "master" node - it is up
to you to decide when you configure resource placement. By default
pacemaker will attempt to spread resources across all eligible nodes.
You can influence node selection by using constraints. See
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_deciding_which_nodes_a_resource_can_run_on.html
for details.
But in any case - all your resources MUST be capable of running of both
nodes, otherwise cluster makes no sense. If one resource A depends on
something that another resource B provides and can be started only
together with resource B (and after it is ready) - you must tell it to
pacemaker by using resource colocations and ordering. See same document
for details.
> clustera pengine: info: native_print: ipmi-fence-clustera
> (stonith:fence_ipmilan): Started clustera
> clustera pengine: info: native_print: ipmi-fence-clusterb
> (stonith:fence_ipmilan): Started clustera
> clustera pengine: info: group_print: Resource Group: cluster
> clustera pengine: info: native_print: cluster_fs
> (ocf::heartbeat:Filesystem): Started clusterb
> clustera pengine: info: native_print: cluster_vip
> (ocf::heartbeat:IPaddr2): Started clusterb
> clustera pengine: info: native_print: cluster_sid
> (ocf::heartbeat:oracle): Started clusterb
> clustera pengine: info: native_print:
> cluster_listnr (ocf::heartbeat:oralsnr): Started clusterb
> clustera pengine: info: get_failcount_full: cluster_sid has
> failed INFINITY times on clustera
>
>
> *clustera pengine: warning: common_apply_stickiness: Forcing
> cluster_sid away from clustera after 1000000 failures (max=1000000)*
> *=> Question: too much trying result in forbid the resource start on
> clustera ?*
>
Yes.
> Couple days ago, the clusterb has been stonith by unknown reason, but only
> "cluster_fs", "cluster_vip" moved to clustera successfully, but
> "cluster_sid" and "cluster_listnr" go to "STOP" status.
> like below messages, is it related with "op start for cluster_sid on
> clustera..." ?
>
Yes. Node clustera is now marked as being incapable of running resource
so if node cluaterb fails, resource cannot be started anywhere.
> clustera pengine: warning: unpack_rsc_op_failure: Processing failed op
> start for cluster_sid on clustera: unknown error (1)
> clustera pengine: info: native_print: ipmi-fence-clustera
> (stonith:fence_ipmilan): Started clustera
> clustera pengine: info: native_print: ipmi-fence-clusterb
> (stonith:fence_ipmilan): Started clustera
> clustera pengine: info: group_print: Resource Group: cluster
> clustera pengine: info: native_print: cluster_fs
> (ocf::heartbeat:Filesystem): Started clusterb (UNCLEAN)
> clustera pengine: info: native_print: cluster_vip
> (ocf::heartbeat:IPaddr2): Started clusterb (UNCLEAN)
> clustera pengine: info: native_print: cluster_sid
> (ocf::heartbeat:oracle): Started clusterb (UNCLEAN)
> clustera pengine: info: native_print: cluster_listnr
> (ocf::heartbeat:oralsnr): Started clusterb (UNCLEAN)
> clustera pengine: info: get_failcount_full: cluster_sid has
> failed INFINITY times on clustera
> clustera pengine: warning: common_apply_stickiness: Forcing
> cluster_sid away from clustera after 1000000 failures (max=1000000)
> clustera pengine: info: rsc_merge_weights: cluster_fs: Rolling
> back scores from cluster_sid
> clustera pengine: info: rsc_merge_weights: cluster_vip: Rolling
> back scores from cluster_sid
> clustera pengine: info: rsc_merge_weights: cluster_sid: Rolling
> back scores from cluster_listnr
> clustera pengine: info: native_color: Resource cluster_sid cannot
> run anywhere
> clustera pengine: info: native_color: Resource cluster_listnr
> cannot run anywhere
> clustera pengine: warning: custom_action: Action cluster_fs_stop_0 on
> clusterb is unrunnable (offline)
> clustera pengine: info: RecurringOp: Start recurring monitor
> (20s) for cluster_fs on clustera
> clustera pengine: warning: custom_action: Action cluster_vip_stop_0 on
> clusterb is unrunnable (offline)
> clustera pengine: info: RecurringOp: Start recurring monitor
> (10s) for cluster_vip on clustera
> clustera pengine: warning: custom_action: Action cluster_sid_stop_0 on
> clusterb is unrunnable (offline)
> clustera pengine: warning: custom_action: Action cluster_sid_stop_0 on
> clusterb is unrunnable (offline)
> clustera pengine: warning: custom_action: Action cluster_listnr_stop_0
> on clusterb is unrunnable (offline)
> clustera pengine: warning: custom_action: Action cluster_listnr_stop_0
> on clusterb is unrunnable (offline)
> clustera pengine: warning: stage6: Scheduling Node clusterb for STONITH
> clustera pengine: info: native_stop_constraints:
> cluster_fs_stop_0 is implicit after clusterb is fenced
> clustera pengine: info: native_stop_constraints:
> cluster_vip_stop_0 is implicit after clusterb is fenced
> clustera pengine: info: native_stop_constraints:
> cluster_sid_stop_0 is implicit after clusterb is fenced
> clustera pengine: info: native_stop_constraints:
> cluster_listnr_stop_0 is implicit after clusterb is fenced
> clustera pengine: info: LogActions: Leave ipmi-fence-db01
> (Started clustera)
> clustera pengine: info: LogActions: Leave ipmi-fence-db02
> (Started clustera)
> clustera pengine: notice: LogActions: Move cluster_fs
> (Started clusterb -> clustera)
> clustera pengine: notice: LogActions: Move cluster_vip
> (Started clusterb -> clustera)
> clustera pengine: notice: LogActions: Stop cluster_sid
> (clusterb)
> clustera pengine: notice: LogActions: Stop cluster_listnr
> (clusterb)
> clustera pengine: warning: process_pe_message: Calculated
> Transition 26821: /var/lib/pacemaker/pengine/pe-warn-7.bz2
> clustera crmd: info: do_state_transition: State transition
> S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> clustera crmd: info: do_te_invoke: Processing graph 26821
> (ref=pe_calc-dc-1526868653-26882) derived from
> /var/lib/pacemaker/pengine/pe-warn-7.bz2
> clustera crmd: notice: te_fence_node: Executing reboot fencing
> operation (23) on clusterb (timeout=60000)
>
>
> Thanks ~~~~
>
>
>
>
> _______________________________________________
> Users mailing list: Users at clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
More information about the Users
mailing list