[ClusterLabs] Pengine always trying to start the resource on the standby node.

Albert Weng weng.albert at gmail.com
Tue Jun 5 21:27:29 EDT 2018


 Hi All,

I have created active/passive pacemaker cluster on RHEL 7.

Here are my environment:
clustera : 192.168.11.1 (passive)
clusterb : 192.168.11.2 (master)
clustera-ilo4 : 192.168.11.10
clusterb-ilo4 : 192.168.11.11

cluster resource status :
     cluster_fs        started on clusterb
     cluster_vip       started on clusterb
     cluster_sid       started on clusterb
     cluster_listnr    started on clusterb

Both cluster node are online status.

i found my corosync.log contain many records like below:

clustera        pengine:     info: determine_online_status_fencing:
Node clusterb is active
clustera        pengine:     info: determine_online_status:        Node
clusterb is online
clustera        pengine:     info: determine_online_status_fencing:
Node clustera is active
clustera        pengine:     info: determine_online_status:        Node
clustera is online

*clustera        pengine:  warning: unpack_rsc_op_failure:  Processing
failed op start for cluster_sid on clustera: unknown error (1)*
*=> Question :  Why pengine always trying to start cluster_sid on the
passive node? how to fix it? *

clustera        pengine:     info: native_print:   ipmi-fence-clustera
(stonith:fence_ipmilan):        Started clustera
clustera        pengine:     info: native_print:   ipmi-fence-clusterb
(stonith:fence_ipmilan):        Started clustera
clustera        pengine:     info: group_print:     Resource Group: cluster
clustera        pengine:     info: native_print:        cluster_fs
(ocf::heartbeat:Filesystem):    Started clusterb
clustera        pengine:     info: native_print:        cluster_vip
(ocf::heartbeat:IPaddr2):       Started clusterb
clustera        pengine:     info: native_print:        cluster_sid
(ocf::heartbeat:oracle):        Started clusterb
clustera        pengine:     info: native_print:
cluster_listnr       (ocf::heartbeat:oralsnr):       Started clusterb
clustera        pengine:     info: get_failcount_full:     cluster_sid has
failed INFINITY times on clustera


*clustera        pengine:  warning: common_apply_stickiness:        Forcing
cluster_sid away from clustera after 1000000 failures (max=1000000)*
*=> Question: too much trying result in forbid the resource start on
clustera ?*

Couple days ago, the clusterb has been stonith by unknown reason, but only
"cluster_fs", "cluster_vip" moved to clustera successfully, but
"cluster_sid" and "cluster_listnr" go to "STOP" status.
like below messages, is it related with "op start for cluster_sid on
clustera..." ?

clustera    pengine:  warning: unpack_rsc_op_failure:  Processing failed op
start for cluster_sid on clustera: unknown error (1)
clustera    pengine:     info: native_print:   ipmi-fence-clustera
(stonith:fence_ipmilan):        Started clustera
clustera    pengine:     info: native_print:   ipmi-fence-clusterb
(stonith:fence_ipmilan):        Started clustera
clustera    pengine:     info: group_print:     Resource Group: cluster
clustera    pengine:     info: native_print:        cluster_fs
(ocf::heartbeat:Filesystem):    Started clusterb (UNCLEAN)
clustera    pengine:     info: native_print:        cluster_vip
(ocf::heartbeat:IPaddr2):       Started clusterb (UNCLEAN)
clustera    pengine:     info: native_print:        cluster_sid
(ocf::heartbeat:oracle):        Started clusterb (UNCLEAN)
clustera    pengine:     info: native_print:        cluster_listnr
(ocf::heartbeat:oralsnr):       Started clusterb (UNCLEAN)
clustera    pengine:     info: get_failcount_full:     cluster_sid has
failed INFINITY times on clustera
clustera    pengine:  warning: common_apply_stickiness:        Forcing
cluster_sid away from clustera after 1000000 failures (max=1000000)
clustera    pengine:     info: rsc_merge_weights:      cluster_fs: Rolling
back scores from cluster_sid
clustera    pengine:     info: rsc_merge_weights:      cluster_vip: Rolling
back scores from cluster_sid
clustera    pengine:     info: rsc_merge_weights:      cluster_sid: Rolling
back scores from cluster_listnr
clustera    pengine:     info: native_color:   Resource cluster_sid cannot
run anywhere
clustera    pengine:     info: native_color:   Resource cluster_listnr
cannot run anywhere
clustera    pengine:  warning: custom_action:  Action cluster_fs_stop_0 on
clusterb is unrunnable (offline)
clustera    pengine:     info: RecurringOp:     Start recurring monitor
(20s) for cluster_fs on clustera
clustera    pengine:  warning: custom_action:  Action cluster_vip_stop_0 on
clusterb is unrunnable (offline)
clustera    pengine:     info: RecurringOp:     Start recurring monitor
(10s) for cluster_vip on clustera
clustera    pengine:  warning: custom_action:  Action cluster_sid_stop_0 on
clusterb is unrunnable (offline)
clustera    pengine:  warning: custom_action:  Action cluster_sid_stop_0 on
clusterb is unrunnable (offline)
clustera    pengine:  warning: custom_action:  Action cluster_listnr_stop_0
on clusterb is unrunnable (offline)
clustera    pengine:  warning: custom_action:  Action cluster_listnr_stop_0
on clusterb is unrunnable (offline)
clustera    pengine:  warning: stage6: Scheduling Node clusterb for STONITH
clustera    pengine:     info: native_stop_constraints:
cluster_fs_stop_0 is implicit after clusterb is fenced
clustera    pengine:     info: native_stop_constraints:
cluster_vip_stop_0 is implicit after clusterb is fenced
clustera    pengine:     info: native_stop_constraints:
cluster_sid_stop_0 is implicit after clusterb is fenced
clustera    pengine:     info: native_stop_constraints:
cluster_listnr_stop_0 is implicit after clusterb is fenced
clustera    pengine:     info: LogActions:     Leave   ipmi-fence-db01
(Started clustera)
clustera    pengine:     info: LogActions:     Leave   ipmi-fence-db02
(Started clustera)
clustera    pengine:   notice: LogActions:     Move    cluster_fs
(Started clusterb -> clustera)
clustera    pengine:   notice: LogActions:     Move    cluster_vip
(Started clusterb -> clustera)
clustera    pengine:   notice: LogActions:     Stop    cluster_sid
(clusterb)
clustera    pengine:   notice: LogActions:     Stop    cluster_listnr
(clusterb)
clustera    pengine:  warning: process_pe_message:     Calculated
Transition 26821: /var/lib/pacemaker/pengine/pe-warn-7.bz2
clustera       crmd:     info: do_state_transition:    State transition
S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
clustera       crmd:     info: do_te_invoke:   Processing graph 26821
(ref=pe_calc-dc-1526868653-26882) derived from
/var/lib/pacemaker/pengine/pe-warn-7.bz2
clustera       crmd:   notice: te_fence_node:  Executing reboot fencing
operation (23) on clusterb (timeout=60000)


Thanks ~~~~


-- 
Kind regards,
Albert Weng

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
不含病毒。www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clusterlabs.org/pipermail/users/attachments/20180606/11bbcab9/attachment-0002.html>


More information about the Users mailing list