<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 4, 2021 at 5:30 PM Janusz Jaskiewicz <<a href="mailto:janusz.jaskiewicz@gmail.com">janusz.jaskiewicz@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hello.<br><br>Please forgive the length of this email but I wanted to provide as much details as possible.<br><br>I'm trying to set up a cluster of two nodes for my service.<br>I have a problem with a scenario where the network between two nodes gets broken and they can no longer see each other.<br>This causes split-brain.<br>I know that proper way of implementing this would be to employ STONITH, but it is not feasible for me now (I don't have necessary hardware support and I don't want to introduce another point of failure by introducing shared storage based STONITH).<br><br>In order to work-around the split-brain scenario I introduced pingd to my cluster, which in theory should do what I expect.<br>pingd pings a network device, so when the NIC is broken on one of my nodes, this node should not run the resources because pingd would fail for it.<br></div></blockquote><div>As we've discussed on this list in multiple previous threads already there are lots of failure scenarios</div><div>where cluster-nodes don't see each other but both can ping something else on the network.</div><div>Important cases where your approach wouldn't work are as well those where nodes are just</div><div>partially alive - leads to corosync membership being lost & node not able to stop resources</div><div>properly anymore.</div><div>Thus it is highly recommended to have all these setups that rely on some kind of self-fencing or</div><div>bringing down of resources within some timeout being guarded by a (hardware)-watchdog.</div><div>Previously you probably were referring to SBD which implements such a</div><div>watchdog-guarded approach. As you've probably figured out you can't directly use SBD</div><div>in a 2-node-setup without a shared-disk. Pure watchdog-fencing needs quorum decision</div><div>made by at least 3 instances. If you don't want a full blown 3rd node you can consider</div><div>qdevice - can be used by multiple 2-node-clusters for quorum evaluation.</div><div>Otherwise you can use SBD with a shared disk.</div><div>You are right that both, a shared disk and any kind of 3rd node are an additional point of</div><div>failure. Important is that in both cases we are talking about a point of failure but not of a</div><div>single point of failure - meaning it failing it would not necessarily impose services to be</div><div>shutdown.</div><div><br></div><div>Klaus</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br>pingd resource is configured to update the value of variable 'pingd' (interval: 5s, dampen: 3s, multiplier:1000).<br>Based on the value of pingd I have a location constraint which sets score to -INFINITY for resource DimProdClusterIP when 'pingd' is not 1000.<br>All other resources are colocated with DimProdClusterIP, and DimProdClusterIP should start before all other resources.<br><br>Based on that setup I would expect that when the resources run on dimprod01 and I disconnect dimprod02 from the network, the resources will not start on dimprod02.<br>Unfortunately I see that after a token interval + consensus interval my resources are brought up for a moment and then go down again.<br>This is undesirable, as it causes DRBD split-brain inconsistency and cluster IP may also be taken over by the node which is down.<br><br>I tried to debug it, but I can't figure out why it doesn't work.<br>I would appreciate any help/pointers.<br><br><br>Following are some details of my setup and snippet of pacemaker logs with comments:<br><br>Setup details:<br><br>pcs status:<br>Cluster name: dimprodcluster<br>Cluster Summary:<br> * Stack: corosync<br> * Current DC: dimprod02 (version 2.0.5-9.el8_4.1-ba59be7122) - partition with quorum<br> * Last updated: Tue Aug 3 08:20:32 2021<br> * Last change: Mon Aug 2 18:24:39 2021 by root via cibadmin on dimprod01<br> * 2 nodes configured<br> * 8 resource instances configured<br><br>Node List:<br> * Online: [ dimprod01 dimprod02 ]<br><br>Full List of Resources:<br> * DimProdClusterIP (ocf::heartbeat:IPaddr2): Started dimprod01<br> * WyrDimProdServer (systemd:wyr-dim): Started dimprod01<br> * Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData] (promotable):<br> * Masters: [ dimprod01 ]<br> * Slaves: [ dimprod02 ]<br> * WyrDimProdFS (ocf::heartbeat:Filesystem): Started dimprod01<br> * DimTestClusterIP (ocf::heartbeat:IPaddr2): Started dimprod01<br> * Clone Set: ping-clone [ping]:<br> * Started: [ dimprod01 dimprod02 ]<br><br>Daemon Status:<br> corosync: active/enabled<br> pacemaker: active/enabled<br> pcsd: active/enabled<br> <br><br>pcs constraint<br>Location Constraints:<br> Resource: DimProdClusterIP<br> Constraint: location-DimProdClusterIP<br> Rule: score=-INFINITY<br> Expression: pingd ne 1000<br>Ordering Constraints:<br> start DimProdClusterIP then promote WyrDimProdServerData-clone (kind:Mandatory)<br> promote WyrDimProdServerData-clone then start WyrDimProdFS (kind:Mandatory)<br> start WyrDimProdFS then start WyrDimProdServer (kind:Mandatory)<br> start WyrDimProdServer then start DimTestClusterIP (kind:Mandatory)<br>Colocation Constraints:<br> WyrDimProdServer with DimProdClusterIP (score:INFINITY)<br> DimTestClusterIP with DimProdClusterIP (score:INFINITY)<br> WyrDimProdServerData-clone with DimProdClusterIP (score:INFINITY) (with-rsc-role:Master)<br> WyrDimProdFS with DimProdClusterIP (score:INFINITY)<br>Ticket Constraints:<br><br><br>pcs resource config ping<br> Resource: ping (class=ocf provider=pacemaker type=ping)<br> Attributes: dampen=3s host_list=193.30.22.33 multiplier=1000<br> Operations: monitor interval=5s timeout=4s (ping-monitor-interval-5s)<br> start interval=0s timeout=60s (ping-start-interval-0s)<br> stop interval=0s timeout=5s (ping-stop-interval-0s)<br> <br> <br> <br>cat /etc/corosync/corosync.conf<br>totem {<br> version: 2<br> cluster_name: dimprodcluster<br> transport: knet<br> crypto_cipher: aes256<br> crypto_hash: sha256<br> token: 10000<br> interface {<br> knet_ping_interval: 1000<br> knet_ping_timeout: 1000<br> }<br>}<br><br>nodelist {<br> node {<br> ring0_addr: dimprod01<br> name: dimprod01<br> nodeid: 1<br> }<br><br> node {<br> ring0_addr: dimprod02<br> name: dimprod02<br> nodeid: 2<br> }<br>}<br><br>quorum {<br> provider: corosync_votequorum<br> two_node: 1<br>}<br><br>logging {<br> to_logfile: yes<br> logfile: /var/log/cluster/corosync.log<br> to_syslog: yes<br> timestamp: on<br> debug:on<br>}<br><br><br><br>Logs:<br>When the network is connected 'pingd' takes value of 1000:<br><br>Aug 03 08:23:01 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (attrd_client_update) debug: Broadcasting pingd[dimprod02]=1000 (writer)<br>Aug 03 08:23:01 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater [3369856] (pcmk__node_attr_request) debug: Asked pacemaker-attrd to update pingd=1000 for dimprod02: OK (0)<br>Aug 03 08:23:01 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater [3369856] (crm_xml_cleanup) info: Cleaning up memory from libxml2<br>Aug 03 08:23:01 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater [3369856] (crm_exit) info: Exiting attrd_updater | with status 0<br><br>When the network is down we update 'pingd' to 0:<br><br>Aug 03 08:23:09 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (attrd_client_update) debug: Broadcasting pingd[dimprod02]=0 (writer)<br>Aug 03 08:23:09 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater [3370053] (pcmk__node_attr_request) debug: Asked pacemaker-attrd to update pingd=0 for dimprod02: OK (0)<br>Aug 03 08:23:09 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater [3370053] (crm_xml_cleanup) info: Cleaning up memory from libxml2<br>Aug 03 08:23:09 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater [3370053] (crm_exit) info: Exiting attrd_updater | with status 0<br><br>And again:<br><br>Aug 03 08:23:17 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (attrd_client_update) debug: Broadcasting pingd[dimprod02]=0 (writer)<br>Aug 03 08:23:17 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater [3370109] (pcmk__node_attr_request) debug: Asked pacemaker-attrd to update pingd=0 for dimprod02: OK (0)<br>Aug 03 08:23:17 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater [3370109] (crm_xml_cleanup) info: Cleaning up memory from libxml2<br>Aug 03 08:23:17 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> attrd_updater [3370109] (crm_exit) info: Exiting attrd_updater | with status 0<br><br>Then the node realizes it is not connected to the other node:<br><br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (pcmk_cpg_membership) info: Group attrd event 8: dimprod01 (node 1 pid 2118843) left via cluster exit<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based [2827043] (pcmk_cpg_membership) info: Group cib event 8: dimprod01 (node 1 pid 2118840) left via cluster exit<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (crm_update_peer_proc) info: pcmk_cpg_membership: Node dimprod01[1] - corosync-cpg is now offline<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based [2827043] (crm_update_peer_proc) info: pcmk_cpg_membership: Node dimprod01[1] - corosync-cpg is now offline<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (pcmk_cpg_membership) info: Group crmd event 8: dimprod01 (node 1 pid 2118845) left via cluster exit<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced [2827044] (pcmk_cpg_membership) info: Group stonith-ng event 8: dimprod01 (node 1 pid 2118841) left via cluster exit<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (crm_update_peer_proc) info: pcmk_cpg_membership: Node dimprod01[1] - corosync-cpg is now offline<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (crm_update_peer_state_iter) notice: Node dimprod01 state is now lost | nodeid=1 previous=member source=crm_update_peer_proc<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced [2827044] (crm_update_peer_proc) info: pcmk_cpg_membership: Node dimprod01[1] - corosync-cpg is now offline<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (peer_update_callback) info: Node dimprod01 is no longer a peer | DC=true old=0x4000000 new=0x0000000<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (attrd_peer_remove) notice: Removing all dimprod01 attributes for peer loss<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based [2827043] (crm_update_peer_state_iter) notice: Node dimprod01 state is now lost | nodeid=1 previous=member source=crm_update_peer_proc<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (attrd_peer_remove) debug: Removed #attrd-protocol[dimprod01] for peer loss<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based [2827043] (crm_reap_dead_member) info: Removing node with name dimprod01 and id 1 from membership cache<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (attrd_peer_remove) debug: Removed master-WyrDimProdServerData[dimprod01] for peer loss<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced [2827044] (crm_update_peer_state_iter) notice: Node dimprod01 state is now lost | nodeid=1 previous=member source=crm_update_peer_proc<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based [2827043] (reap_crm_member) notice: Purged 1 peer with id=1 and/or uname=dimprod01 from the membership cache<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (attrd_peer_remove) debug: Removed last-failure-WyrDimProdFS#start_0[dimprod01] for peer loss<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (controld_delete_node_state) info: Deleting transient attributes for node dimprod01 (via CIB call 466) | xpath=//node_state[@uname='dimprod01']/transient_attributes<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based [2827043] (pcmk_cpg_membership) info: Group cib event 8: dimprod02 (node 2 pid 2827043) is member<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (attrd_peer_remove) debug: Removed fail-count-WyrDimProdFS#start_0[dimprod01] for peer loss<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced [2827044] (st_peer_update_callback) debug: Broadcasting our uname because of node 1<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (attrd_peer_remove) debug: Removed pingd[dimprod01] for peer loss<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (crm_reap_dead_member) info: Removing node with name dimprod01 and id 1 from membership cache<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced [2827044] (crm_reap_dead_member) info: Removing node with name dimprod01 and id 1 from membership cache<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (reap_crm_member) notice: Purged 1 peer with id=1 and/or uname=dimprod01 from the membership cache<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (pcmk_cpg_membership) info: Group attrd event 8: dimprod02 (node 2 pid 2827046) is member<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (match_down_event) debug: No reason to expect node 1 to be down<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced [2827044] (reap_crm_member) notice: Purged 1 peer with id=1 and/or uname=dimprod01 from the membership cache<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-fenced [2827044] (pcmk_cpg_membership) info: Group stonith-ng event 8: dimprod02 (node 2 pid 2827044) is member<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (peer_update_callback) warning: Stonith/shutdown of node dimprod01 was not expected<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (abort_transition_graph) info: Transition 99 aborted: Node failure | source=peer_update_callback:280 complete=true<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based [2827043] (cib_process_request) info: Forwarding cib_delete operation for section //node_state[@uname='dimprod01']/transient_attributes to all (origin=local/crmd/466)<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (pcmk_cpg_membership) info: Group crmd event 8: dimprod02 (node 2 pid 2827048) is member<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (pcmk__set_flags_as) debug: FSA action flags 0x2000000000000 (new_actions) for controller set by s_crmd_fsa:198<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (s_crmd_fsa) debug: Processing I_PE_CALC: [ state=S_IDLE cause=C_FSA_INTERNAL origin=abort_transition_graph ]<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-based [2827043] (cib_process_request) info: Forwarding cib_modify operation for section status to all (origin=local/crmd/467)<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (do_state_transition) notice: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (pcmk__set_flags_as) debug: FSA action flags 0x00000020 (A_INTEGRATE_TIMER_STOP) for controller set by do_state_transition:559<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (pcmk__set_flags_as) debug: FSA action flags 0x00000080 (A_FINALIZE_TIMER_STOP) for controller set by do_state_transition:565<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (pcmk__set_flags_as) debug: FSA action flags 0x00000200 (A_DC_TIMER_STOP) for controller set by do_state_transition:569<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (check_join_counts) debug: Sole active cluster node is fully joined<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (pcmk__clear_flags_as) debug: FSA action flags 0x00000200 (an_action) for controller cleared by do_fsa_action:108<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (pcmk__clear_flags_as) debug: FSA action flags 0x00000020 (an_action) for controller cleared by do_fsa_action:108<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (pcmk__clear_flags_as) debug: FSA action flags 0x00000080 (an_action) for controller cleared by do_fsa_action:108<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (pcmk__clear_flags_as) debug: FSA action flags 0x2000000000000 (an_action) for controller cleared by do_fsa_action:108<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (do_pe_invoke) debug: Query 468: Requesting the current CIB: S_POLICY_ENGINE<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (pcmk_quorum_notification) info: Quorum retained | membership=1140 members=1<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-attrd [2827046] (attrd_peer_update) notice: Setting pingd[dimprod02]: 1000 -> 0 | from dimprod02<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (pcmk_quorum_notification) debug: Member[0] 2 <br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (crm_update_peer_state_iter) notice: Node dimprod01 state is now lost | nodeid=1 previous=member source=crm_reap_unseen_nodes<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (peer_update_callback) info: Cluster node dimprod01 is now lost (was member)<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (match_down_event) debug: No reason to expect node 1 to be down<br>Aug 03 08:23:23 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-controld [2827048] (peer_update_callback) warning: Stonith/shutdown of node dimprod01 was not expected<br><br>And then the node decides to allocate the resources to this node and start it, ignoring the location constraint<br><br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config) debug: STONITH timeout: 60000<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config) debug: STONITH of failed nodes is disabled<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config) debug: Concurrent fencing is enabled<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config) debug: Stop all active resources: false<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config) debug: Cluster is symmetric - resources can run anywhere by default<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config) debug: On loss of quorum: Stop ALL resources<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_config) debug: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (determine_online_status) info: Node dimprod02 is online<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_find_resource) debug: Internally renamed WyrDimProdServerData on dimprod02 to WyrDimProdServerData:0<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_find_resource) debug: Internally renamed ping on dimprod02 to ping:0<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: DimProdClusterIP (ocf::heartbeat:IPaddr2): Stopped<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: WyrDimProdServer (systemd:wyr-dim): Stopped<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData] (promotable): Slaves: [ dimprod02 ]<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData] (promotable): Stopped: [ dimprod01 ]<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: WyrDimProdFS (ocf::heartbeat:Filesystem): Stopped<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: DimTestClusterIP (ocf::heartbeat:IPaddr2): Stopped<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: Clone Set: ping-clone [ping]: Started: [ dimprod02 ]<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: Clone Set: ping-clone [ping]: Stopped: [ dimprod01 ]<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness) debug: Resource WyrDimProdServerData:0: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness) debug: Resource ping:0: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Assigning dimprod02 to DimProdClusterIP<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Assigning dimprod02 to WyrDimProdServer<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children) debug: Allocating up to 2 WyrDimProdServerData-clone instances to a possible 1 nodes (at most 1 per host, 2 optimal)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Assigning dimprod02 to WyrDimProdServerData:0<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: All nodes for resource WyrDimProdServerData:1 are unavailable, unclean or shutting down (dimprod01: 0, -1000000)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Could not allocate a node for WyrDimProdServerData:1<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate) info: Resource WyrDimProdServerData:1 cannot run anywhere<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children) debug: Allocated 1 WyrDimProdServerData-clone instances of a possible 2<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles) debug: WyrDimProdServerData:0 promotion score: 1000<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles) info: Promoting WyrDimProdServerData:0 (Slave dimprod02)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles) debug: WyrDimProdServerData:1 promotion score: 1000<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles) info: WyrDimProdServerData-clone: Promoted 1 instances of a possible 1 to master<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Assigning dimprod02 to WyrDimProdFS<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Assigning dimprod02 to DimTestClusterIP<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children) debug: Allocating up to 2 ping-clone instances to a possible 1 nodes (at most 1 per host, 2 optimal)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Assigning dimprod02 to ping:0<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: All nodes for resource ping:1 are unavailable, unclean or shutting down (dimprod01: 0, -1000000)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Could not allocate a node for ping:1<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate) info: Resource ping:1 cannot run anywhere<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children) debug: Allocated 1 ping-clone instances of a possible 2<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (RecurringOp) info: Start recurring monitor (30s) for DimProdClusterIP on dimprod02<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (RecurringOp) info: Start recurring monitor (60s) for WyrDimProdServer on dimprod02<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (RecurringOp) info: Cancelling action WyrDimProdServerData:0_monitor_60000 (Slave vs. Master)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (create_promotable_actions) debug: Creating actions for WyrDimProdServerData-clone<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (RecurringOp) info: Cancelling action WyrDimProdServerData:0_monitor_60000 (Slave vs. Master)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (RecurringOp) info: Start recurring monitor (20s) for WyrDimProdFS on dimprod02<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (RecurringOp) info: Start recurring monitor (30s) for DimTestClusterIP on dimprod02<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction) notice: * Start DimProdClusterIP ( dimprod02 ) <br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction) notice: * Start WyrDimProdServer ( dimprod02 ) <br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction) notice: * Promote WyrDimProdServerData:0 ( Slave -> Master dimprod02 ) <br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions) info: Leave WyrDimProdServerData:1 (Stopped)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction) notice: * Start WyrDimProdFS ( dimprod02 ) <br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction) notice: * Start DimTestClusterIP ( dimprod02 ) <br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions) info: Leave ping:0 (Started dimprod02)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions) info: Leave ping:1 (Stopped)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (action2xml) debug: Using anonymous clone name WyrDimProdServerData for WyrDimProdServerData:0 (aka. WyrDimProdServerData)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (action2xml) debug: Using anonymous clone name WyrDimProdServerData for WyrDimProdServerData:0 (aka. WyrDimProdServerData)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (action2xml) debug: Using anonymous clone name WyrDimProdServerData for WyrDimProdServerData:0 (aka. WyrDimProdServerData)<br>Aug 03 08:23:24 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (action2xml) debug: Using anonymous clone name WyrDimProdServerData for WyrDimProdServerData:0 (aka. WyrDimProdServerData)<br><br>When the resources are started, the node decides that they can not be allocated and closes them:<br><br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (determine_online_status) info: Node dimprod02 is online<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_find_resource) debug: Internally renamed WyrDimProdServerData on dimprod02 to WyrDimProdServerData:0<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (unpack_find_resource) debug: Internally renamed ping on dimprod02 to ping:0<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: DimProdClusterIP (ocf::heartbeat:IPaddr2): Started dimprod02<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: WyrDimProdServer (systemd:wyr-dim): Started dimprod02<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData] (promotable): Masters: [ dimprod02 ]<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: Clone Set: WyrDimProdServerData-clone [WyrDimProdServerData] (promotable): Stopped: [ dimprod01 ]<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: WyrDimProdFS (ocf::heartbeat:Filesystem): Started dimprod02<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: DimTestClusterIP (ocf::heartbeat:IPaddr2): Stopped<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: Clone Set: ping-clone [ping]: Started: [ dimprod02 ]<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (log_list_item) info: Clone Set: ping-clone [ping]: Stopped: [ dimprod01 ]<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness) debug: Resource DimProdClusterIP: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness) debug: Resource WyrDimProdServer: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness) debug: Resource WyrDimProdServerData:0: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness) debug: Resource WyrDimProdFS: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (common_apply_stickiness) debug: Resource ping:0: preferring current location (node=dimprod02, weight=100)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_merge_weights) info: DimProdClusterIP: Rolling back optional scores from WyrDimProdServerData-clone<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_merge_weights) info: DimProdClusterIP: Rolling back optional scores from WyrDimProdFS<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_merge_weights) info: DimProdClusterIP: Rolling back optional scores from WyrDimProdServer<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_merge_weights) info: DimProdClusterIP: Rolling back optional scores from DimTestClusterIP<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: All nodes for resource DimProdClusterIP are unavailable, unclean or shutting down (dimprod02: 1, -1000000)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Could not allocate a node for DimProdClusterIP<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Processing DimProdClusterIP_monitor_30000<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate) info: Resource DimProdClusterIP cannot run anywhere<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: All nodes for resource WyrDimProdServer are unavailable, unclean or shutting down (dimprod02: 1, -1000000)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Could not allocate a node for WyrDimProdServer<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate) info: Resource WyrDimProdServer cannot run anywhere<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children) debug: Allocating up to 2 WyrDimProdServerData-clone instances to a possible 1 nodes (at most 1 per host, 2 optimal)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Assigning dimprod02 to WyrDimProdServerData:0<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: All nodes for resource WyrDimProdServerData:1 are unavailable, unclean or shutting down (dimprod01: 0, -1000000)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Could not allocate a node for WyrDimProdServerData:1<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate) info: Resource WyrDimProdServerData:1 cannot run anywhere<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children) debug: Allocated 1 WyrDimProdServerData-clone instances of a possible 2<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (filter_colocation_constraint) error: WyrDimProdServerData:0 must be colocated with DimProdClusterIP but is not (dimprod02 vs. unallocated)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles) debug: WyrDimProdServerData:0 promotion score: 10000<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles) info: Promoting WyrDimProdServerData:0 (Master dimprod02)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles) debug: WyrDimProdServerData:1 promotion score: 10000<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__set_instance_roles) info: WyrDimProdServerData-clone: Promoted 1 instances of a possible 1 to master<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: All nodes for resource WyrDimProdFS are unavailable, unclean or shutting down (dimprod01: 0, -1000000)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Could not allocate a node for WyrDimProdFS<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Processing WyrDimProdFS_monitor_20000<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate) info: Resource WyrDimProdFS cannot run anywhere<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: All nodes for resource DimTestClusterIP are unavailable, unclean or shutting down (dimprod01: 0, -1000000)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Could not allocate a node for DimTestClusterIP<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate) info: Resource DimTestClusterIP cannot run anywhere<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children) debug: Allocating up to 2 ping-clone instances to a possible 1 nodes (at most 1 per host, 2 optimal)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Assigning dimprod02 to ping:0<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: All nodes for resource ping:1 are unavailable, unclean or shutting down (dimprod01: 0, -1000000)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (native_assign_node) debug: Could not allocate a node for ping:1<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (pcmk__native_allocate) info: Resource ping:1 cannot run anywhere<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (distribute_children) debug: Allocated 1 ping-clone instances of a possible 2<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (create_promotable_actions) debug: Creating actions for WyrDimProdServerData-clone<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction) notice: * Stop DimProdClusterIP ( dimprod02 ) due to node availability<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction) notice: * Stop WyrDimProdServer ( dimprod02 ) due to node availability<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions) info: Leave WyrDimProdServerData:0 (Master dimprod02)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions) info: Leave WyrDimProdServerData:1 (Stopped)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogAction) notice: * Stop WyrDimProdFS ( dimprod02 ) due to node availability<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions) info: Leave DimTestClusterIP (Stopped)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions) info: Leave ping:0 (Started dimprod02)<br>Aug 03 08:23:27 <a href="http://dimprod02.my.clustertest.com" target="_blank">dimprod02.my.clustertest.com</a> pacemaker-schedulerd[2827047] (LogActions) info: Leave ping:1 (Stopped)<br><br>So the final result is OK, I would just like to avoid the start of the resources on disconnected node.<br>I have no idea how I could further debug it.<br>I will appreciate any help.<br>If that's helpful I can provide full debug log and more configuration details.<br><br>Regards,<br>Janusz.<br></div>
_______________________________________________<br>
Manage your subscription:<br>
<a href="https://lists.clusterlabs.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.clusterlabs.org/mailman/listinfo/users</a><br>
<br>
ClusterLabs home: <a href="https://www.clusterlabs.org/" rel="noreferrer" target="_blank">https://www.clusterlabs.org/</a><br>
</blockquote></div></div>